How Do You Solve This Problem “Error: Could not find or load main class org.apache.hadoop.util.VersionInfo”

Problem scenario
You run “hadoop version” but you receive this message “Error: Could not find or load main class org.apache.hadoop.util.VersionInfo”. What do you do?

Possible Solution #1
Use “sudo ” before the “hadoop version” command.

Possible Solution #2
Use “sudo -i ” before the “hadoop version” command.

Possible Solution #3
Use a different user.

 » Read more..

How Do You Get Hadoop Commands to Work from Any Directory without Using the Full Path?

Problem scenario
Hadoop is installed on Linux. But hadoop version and other hadoop commands are not working. What should you do?

Solution
Find the hadoop executable in a directory named bin. It is often “/usr/local/hadoop/bin/hadoop”. Ultimately you need to find the directory that houses this “bin.” has a subdirectory with “bin” and “hadoop” inside, run these two commands:

sudo find / -name hadoop -type f
whereis hadoop

Run these commands interactively where “/usr/local/hadoop” is the directory that is the parent of the subdirectory named “bin” that is the parent of the hadoop executable.

 » Read more..

How Do You Know if Hadoop is Installed (and the version if it is installed) on Linux SUSE?

Problem scenario
You are administering Linux SUSE machines. You want to see if Hadoop is installed on them. The command hadoop version does not work.

Solution
Run this command:

sudo find / -name hadoop -type f

From the results above, you can probably find the file and path of the executable. It will likely not be in /var/ or /tmp/.

 » Read more..

How Do You Troubleshoot the HDFS Error “failed on connection exception: java.net.ConnectException: Connection refused;”?

Problem scenario
You have a multi-node Hadoop cluster running Hadoop version 3. You run this command: hdfs dfsadmin -report

You receive an error that includes this message: “failed on connection exception: java.net.ConnectException: Connection refused; “

What should you do?

Potential Solution
Run these three commands:

bash /usr/local/hadoop/sbin/stop-dfs.sh
hdfs namenode -format
bash /usr/local/hadoop/sbin/start-dfs.sh

 » Read more..

How Do You Troubleshoot “Error: Could Not Create the Java Virtual Machine”?

Problem scenario
You ran a Hadoop command but you receive this error:

Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

What do you do?

Solution
Run hadoop help. This error can happen when you have an incorrect flag.

 » Read more..

What Database Does Elastic Search Use?

Question
What SQL database underlies Elastic Search?

Answer
There is no relational database for Elastic Search.

Elastic Search will have a cluster that is composed of nodes. Nodes are composed of indexes. Indexes are supported by shards. Shards have primary and replica copies. The replica copies vary. The shards are essentially Lucene indexes (according to this site).

 » Read more..

What is a Continuous Application?

Question
What is a Continuous Application?

Answer
Databricks website says this “We define a continuous application as an end-to-end application that reacts to data in real-time.”

Although the proper term should probably be “continual application” as there may be discrete moments where no data is coming in. Many streams can be interrupted. In fact, Structured Streaming, an aspect of Spark that is used in Databricks’ “continuous applications” is based on microbatching (according to this site).

 » Read more..

How Do You Troubleshoot the Installation of Apache Accumulo on Linux?

Problem scenario
You are trying to install open source Accumulo on Linux. You have two GB of swap space. You have installed Java, Hadoop, and Zookeeper. You have run the bootstrap_config.sh script for Accumulo 1.9.2.

You run this (and expected it to work): /bin/accumulo-1.9.2/bin/accumulo init

But you get this error:

OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one,  » Read more..

How Do You Set Up a Multi-Node Cluster of Zookeeper?

Problem scenario
You want to set up Zookeeper with three nodes in AWS. What do you do?

Solution
1. Install Zookeeper on each of the servers. If you need assistance with this, see this posting.

2. Modify the zoo.cfg file on each of the servers. Add stanzas like these but substitute foobarX.amazonaws.com with the Public DNS name of each server:

server.1=foobar1.amazonaws.com:2888:3888
server.2=foobar2.amazonaws.com:2888:3888
server.3=foobar3.amazonaws.com:2888:3888
initLimit=5
syncLimit=5

3.

 » Read more..

How Do You Troubleshoot HBase Commands That Hang?

Problem scenarios
One of the following apply to you.

Problem #1
You run an HBase command but it hangs indefinitely, and there is no error. What could be the problem?

Problem #2
You run an HBase command but you see this:
“ERROR: KeeperErrorCode = ConnectionLoss for /hbase/master”

Solution
Has Zookeeper been started?

 » Read more..