How Do You Troubleshoot an Empty Multi-node Hadoop Cluster?

Problem scenario
One or more of the following is happening:
   1) There are 0 DataNodes in your Hadoop cluster according to an error message
   2) There is 0 B configured as capacity (as shown from a "hdfs dfsadmin -report" command).
   3)  There is one fewer DataNode in your Hadoop cluster than you expect.
   4)  You run "hdfs dfsadmin -report | grep Hostname" and do not see a node that has its DataNode service (as seen with the jps command) started and stopped from the NameNode with corresponding start-dfs.sh and stop-dfs.sh script runs.

What should you do?

Possible solution
Warning: This will delete all data in your Hadoop cluster.

1.  Go to the node that should be a DataNode (it could be the NameNode itself), do the following three steps:
     1.1  Delete everything in /app/hadoop/tmp (including subdirectories).
     1.2  Are you ready to delete all the content/data in your cluster? If so, run this without the "#" mark: # hdfs namenode -format #all your data will be deleted permanently!!!
     1.3  bash /usr/local/hadoop/sbin/start-dfs.sh

If you are still having problems, you may want to view this posting.

Leave a comment

Your email address will not be published. Required fields are marked *