How Do You Troubleshoot the HDFS Error “failed on connection exception: java.net.ConnectException: Connection refused;”?

Problem scenario
You have a multi-node Hadoop cluster running Hadoop version 3. You run this command: hdfs dfsadmin -report

You receive an error that includes this message: “failed on connection exception: java.net.ConnectException: Connection refused; “

What should you do?

Potential Solution
Run these three commands:

bash /usr/local/hadoop/sbin/stop-dfs.sh
hdfs namenode -format
bash /usr/local/hadoop/sbin/start-dfs.sh

 » Read more..

How Do You Troubleshoot “Error: Could Not Create the Java Virtual Machine”?

Problem scenario
You ran a Hadoop command but you receive this error:

Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

What do you do?

Solution
Run hadoop help. This error can happen when you have an incorrect flag.

 » Read more..

What Database Does Elastic Search Use?

Question
What SQL database underlies Elastic Search?

Answer
There is no relational database for Elastic Search.

Elastic Search will have a cluster that is composed of nodes. Nodes are composed of indexes. Indexes are supported by shards. Shards have primary and replica copies. The replica copies vary. The shards are essentially Lucene indexes (according to this site).

 » Read more..

What is a Continuous Application?

Question
What is a Continuous Application?

Answer
Databricks website says this “We define a continuous application as an end-to-end application that reacts to data in real-time.”

Although the proper term should probably be “continual application” as there may be discrete moments where no data is coming in. Many streams can be interrupted. In fact, Structured Streaming, an aspect of Spark that is used in Databricks’ “continuous applications” is based on microbatching (according to this site).

 » Read more..

How Do You Troubleshoot the Installation of Apache Accumulo on Linux?

Problem scenario
You are trying to install open source Accumulo on Linux. You have two GB of swap space. You have installed Java, Hadoop, and Zookeeper. You have run the bootstrap_config.sh script for Accumulo 1.9.2.

You run this (and expected it to work): /bin/accumulo-1.9.2/bin/accumulo init

But you get this error:

OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one,  » Read more..

How Do You Set Up a Multi-Node Cluster of Zookeeper?

Problem scenario
You want to set up Zookeeper with three nodes in AWS. What do you do?

Solution
1. Install Zookeeper on each of the servers. If you need assistance with this, see this posting.

2. Modify the zoo.cfg file on each of the servers. Add stanzas like these but substitute foobarX.amazonaws.com with the Public DNS name of each server:

server.1=foobar1.amazonaws.com:2888:3888
server.2=foobar2.amazonaws.com:2888:3888
server.3=foobar3.amazonaws.com:2888:3888
initLimit=5
syncLimit=5

3.

 » Read more..

How Do You Troubleshoot HBase Commands That Hang?

Problem scenarios
One of the following apply to you.

Problem #1
You run an HBase command but it hangs indefinitely, and there is no error. What could be the problem?

Problem #2
You run an HBase command but you see this:
“ERROR: KeeperErrorCode = ConnectionLoss for /hbase/master”

Solution
Has Zookeeper been started?

 » Read more..

How Do You Troubleshoot Cassandra when It Hangs on the Message “ColumnFamilyStore.java Initializing”?

Problem scenario
You start Cassandra with this command: ./bin/cassandra
You see one of the following messages:

INFO [MigrationStage:1] 2018-04-06 19:01:07,144 ColumnFamilyStore.java:391 – Initializing system_auth.resource_role_permissons_index
INFO [MigrationStage:1] 2018-04-06 19:01:07,163 ColumnFamilyStore.java:391 – Initializing system_auth.role_members

No progress is happening. What should you do?

Solution
Possible Solution #1. Try rebooting the server. This could help the problem.

 » Read more..

How Do You Troubleshoot the Message “ERROR: but there is no HDFS_DATANODE_USER defined.”?

Problem scenarios
One of the following apply to you.

Situation 1:
You run “start-dfs.sh” and it seems to work, but the “jps” command does not show that “DataNode” is running.

OR

Situation 2:
You run “sudo bash start-dfs.sh” but you receive this message:

ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined.

 » Read more..

What Do You Do when Cassandra Stalls on “Initializing IndexInfo”?

Problem scenario
When you start Cassandra you see a message such as this:

INFO [main] 2018-02-03 08:45:55,257 ColumnFamilyStore.java:389 – Initializing system.IndexInfo

What should you do?

Possible Solution #1
Try rebooting the server. This could help the problem.

Possible Solution #2
This next one is merely a workaround. It is not a best practice.

 » Read more..