How Do You Get All of Your Live Nodes to Appear in The “hdfs dfsadmin -report” Results?

Problem scenario
You use the "hdfs dfsadmin -report" command on your Hadoop NameNode server.  You see that there is a small amount of "DFS Remaining."  You expect many more GB. You also see one datanode not listed under the "Live datanodes" section. All of your live datanodes are not appearing in hdfs,

Your /usr/local/hadoop/etc/hadoop/slaves file has the DNS name of a server you expect to be a datanode.  You see no reference to this data node "hdfs dfsadmin -report" output.

When you run start-dfs.sh from the NameNode server you see that the DataNode service starts up on a slave server in a multi-node Hadoop cluster.  Likewise when you run stop-dfs.sh you see the DataNode service disappear from the back-end of the slave server itself.  The NameNode (or Master Hadoop server) controls the datanode service perfectly on a specific slave server.

But when you run "sudo /usr/local/hadoop/bin/hdfs dfsadmin -report" (or hdfs dfsadmin -report) you see only one live datanode and it is NOT the one described above.  You have at least two data nodes.  What can be done to get this other node to appear in the report?  You want this additional datanode to be live.  You know the start-dfs.sh and stop-dfs.sh scripts control the server well.  But you do not know why the dfsadmin report will monitor this server.

What can you do to add another datanode and have it participate in the hdfs cluster or otherwise get the report to accurately show which data nodes you have running?

Solution
Prerequisite
This assumes you have a multi-node deployment of Hadoop.  If you do not know how to deploy this, see these directions.

Root cause
An intermediate firewall is blocking certain ports but not other ports between the datanode in question and the namenode.  If every port was blocked, the DataNode service would not start and stop from the MasterNode operations (e.g., running the start-dfs.sh and stop-dfs.sh scripts).  Some ports would have to be open.  The problem of not having the datanode listed as "live" when you run "hdfs dfsadmin -report" on the namenode with corresponding functionality is attributable to ports 54310 and 50010 being blocked.

Procedure of Solution
From the datanode server, ensure that outbound ports 54310 and 50010 are unblocked to the name node server (aka master server).

To the namenode server, ensure that inbound ports 54310 and 50010 are unblocked from the datanode server (aka master server).

You may need to run stop-dfs.sh and start-dfs.sh again once the ports are opened. Commands like these may help you "sudo bash /usr/local/hadoop/sbin/stop-dfs.sh && sudo bash /usr/local/hadoop/sbin/start-dfs.sh".

Miscellaneous
The nmap utility is useful for checking out this problem closely.

If you have the NameNode running in Azure, you would need to manually add an inbound security rule in the relevant Network Security Group.  In our experience opening the security rule for any Source Port and any Destination Port makes the configuration much more simple.

Leave a comment

Your email address will not be published. Required fields are marked *