What is a Continuous Application?

Question
What is a Continuous Application?

Answer
Databricks website says this “We define a continuous application as an end-to-end application that reacts to data in real-time.”

Although the proper term should probably be “continual application” as there may be discrete moments where no data is coming in. Many streams can be interrupted. In fact, Structured Streaming, an aspect of Spark that is used in Databricks’ “continuous applications” is based on microbatching (according to this site).

 » Read more..

How Do You Troubleshoot the Installation of Apache Accumulo on Linux?

Problem scenario
You are trying to install open source Accumulo on Linux. You have two GB of swap space. You have installed Java, Hadoop, and Zookeeper. You have run the bootstrap_config.sh script for Accumulo 1.9.2.

You run this (and expected it to work): /bin/accumulo-1.9.2/bin/accumulo init

But you get this error:

OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one,  » Read more..

How Do You Set Up a Multi-Node Cluster of Zookeeper?

Problem scenario
You want to set up Zookeeper with three nodes in AWS. What do you do?

Solution
1. Install Zookeeper on each of the servers. If you need assistance with this, see this posting.

2. Modify the zoo.cfg file on each of the servers. Add stanzas like these but substitute foobarX.amazonaws.com with the Public DNS name of each server:

server.1=foobar1.amazonaws.com:2888:3888
server.2=foobar2.amazonaws.com:2888:3888
server.3=foobar3.amazonaws.com:2888:3888
initLimit=5
syncLimit=5

3.

 » Read more..

How Do You Troubleshoot HBase Commands That Hang?

Problem scenarios
One of the following apply to you.

Problem #1
You run an HBase command but it hangs indefinitely, and there is no error. What could be the problem?

Problem #2
You run an HBase command but you see this:
“ERROR: KeeperErrorCode = ConnectionLoss for /hbase/master”

Solution
Has Zookeeper been started?

 » Read more..

How Do You Troubleshoot Cassandra when It Hangs on the Message “ColumnFamilyStore.java Initializing”?

Problem scenario
You start Cassandra with this command: ./bin/cassandra
You see one of the following messages:

INFO [MigrationStage:1] 2018-04-06 19:01:07,144 ColumnFamilyStore.java:391 – Initializing system_auth.resource_role_permissons_index
INFO [MigrationStage:1] 2018-04-06 19:01:07,163 ColumnFamilyStore.java:391 – Initializing system_auth.role_members

No progress is happening. What should you do?

Solution
Possible Solution #1. Try rebooting the server. This could help the problem.

 » Read more..

How Do You Troubleshoot the Message “ERROR: but there is no HDFS_DATANODE_USER defined.”?

Problem scenarios
One of the following apply to you.

Situation 1:
You run “start-dfs.sh” and it seems to work, but the “jps” command does not show that “DataNode” is running.

OR

Situation 2:
You run “sudo bash start-dfs.sh” but you receive this message:

ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined.

 » Read more..

What Do You Do when Cassandra Stalls on “Initializing IndexInfo”?

Problem scenario
When you start Cassandra you see a message such as this:

INFO [main] 2018-02-03 08:45:55,257 ColumnFamilyStore.java:389 – Initializing system.IndexInfo

What should you do?

Possible Solution #1
Try rebooting the server. This could help the problem.

Possible Solution #2
This next one is merely a workaround. It is not a best practice.

 » Read more..

How Do You Install Apache Rya on Any Distribution of Linux?

Problem Scenario
You want to install Apache Rya on Linux. What do you do?

Solution

Prerequisites
i. You need a server with at least 5 GB of total memory. You can create swap space with this posting. (Remember that 1 GB of RAM and 2 GB of swap space will be insufficient for installing Rya.)
ii.

 » Read more..

How do you use Google’s Cloud Pub/Sub with Python?

Problem scenario
You want to use a Data Analytics or a Big Data tool that publishes messages and subscribes to listening to messages being published. You know GCP has a Pub/Sub tool. You know it supports synchronous and asynchronous messaging. How do you use it with Python?

Solution

  1. Log into GCP via the web UI.
  2. Go here: https://console.cloud.google.com/cloudpubsub/
  3. Click “Create Topic”.

 » Read more..

How Do You Run Some Cassandra Commands to Create a Table?

Problem scenario
You want to create a table in Cassandra.  How do you do this?

Solution
Prerequisites
Install and configure Cassandra.  If you do not know how, click on the link for the distribution of Linux that you have:

Debian or Ubuntu
CentOS/RHEL/Fedora
SUSE

Procedures
You will have to create a keyspace,

 » Read more..