How Do You Solve the Problem of Being Prompted for Credentials after You Run the start-dfs.sh Script?

Problem scenario
You try to start Hadoop's dfs in a multi-node deployment.  All the Hadoop nodes are running Linux.  You run this:

bash /usr/local/hadoop/sbin/start-dfs.sh

You see this:

Starting namenodes on [hadoopmaster]
hadoopmaster: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoopmaster.out
root@hadoopdatanode's password: localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoopmaster.out
hadoopmaster: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoopmaster.out
localhost: ulimit -a for user root
localhost: core file size          (blocks, -c) 0
localhost: data seg size           (kbytes, -d) unlimited
localhost: scheduling priority             (-e) 0
localhost: file size               (blocks, -f) unlimited
localhost: pending signals                 (-i) 2544
localhost: max locked memory       (kbytes, -l) 64
localhost: max memory size         (kbytes, -m) unlimited
localhost: open files                      (-n) 1024
localhost: pipe size            (512 bytes, -p) 8

You notice that DFS is not running properly on the datanode (aka slave node).  You think the prompts for "root@hadoopdatanode" (the Hadoop slave server) are a sign of the  problem.  What do you do?

Solution
If you want to passwordlessly SSH as a given user from one server to another, the remote server's authorized_keys must have the public key from the local server (server from which you attempt to initiate the SSH connection).

In this case you are being prompted for a password.  The remote server would not do this if its /root/.ssh/authorized_keys had the id_rsa.pub file for the root  user on the starting (local) server.  The starting server is the Hadoop master server.

From the Hadoop namenode (aka master) server, do this:

cat /root/.ssh/id_rsa.pub

Copy the output to the last line of /root/.ssh/authorized_keys on the destination (non-master) Hadoop datanode server.

How Do You Set up a Multi-Node Cluster of Hadoop with Linux Servers?

Updated 1/30/18

Problem scenario

You want to deploy open-source Hadoop as a multi-node cluster to some Linux servers (e.g., two more more servers as opposed to a single-server deployment).  You want to use AWS, Azure and/or Google Cloud Platform Linux servers, possibly in a cross-cloud configuration (using two or three public cloud services).  What do you do to install and configure Hadoop to leverage two or more computers in a potentially hybrid cloud environment?

Solution
These directions work with all the servers being in AWS, Azure, Google Cloud Platform or a mix of each of them!  (These directions should work for a hybrid cloud deployment without much extra configuration for you.  We recommend using nmap and being methodical with checking ports very early in the process if you are deploying a hybrid cloud Hadoop multi-node cluster.)  These directions work regardless of what type of Linux you are using (e.g., CentOS/RedHat/Fedora, Debian/Ubuntu, or SUSE).  It can take less than 15 minutes if there are not too many nodes and the nodes are not too big.

You need just two Linux servers: one for the master server (referred to as the NameNode) and one for the slave node (referred to as the DataNode).  These directions were designed for open-source Hadoop.  The relevant security groups (for AWS, or NSGs for Azure, or firewall rules for Google Cloud Platform) need to allow connectivity between the two clouds.  If you are going to deploy it using two clouds, see the "Tips for deploying a multi-node cluster" at the bottom.

1.  Install Hadoop on at least two different Linux servers.  If you need directions, there is an article for each the three major distributions of Linux (CentOS/RedHat/Fedora, Debian/Ubuntu and SUSE).  For these directions below you will need a group called 'hadoop' and a user called 'hduser' on every NameNode and DataNode.  (These names could be different depending on your specific needs.  If you follow the hyperlinked article above, you will create this group and user so that the rest of these directions below will work.)  Remember your hduser account passwords.  Mentally designate one Linux server as the NameNode (previously referred to as the "master").  The other server(s) will be the DataNodes (previously known as slave(s)).

2.a.  Configure the /etc/hosts file that is on each NameNode (master) server and DataNode (slave) server(s).  A NameNode server's /etc/hosts file  must have an entry of each DataNode.  Datanodes' /etc/hosts files must have one entry for the NameNode server.  The IP address you use depends on  whether or not the server you are adding to the /etc/hosts file is in the same cloud.  If the NameNode and DataNode are in the same cloud, you  would use the internal IP address of the server for which you are entering.  For hetergeneous clouds (e.g., Azure having a NameNode and AWS having  a DataNode), make sure you use corresponding external IP addresses in the /etc/hosts file if the entry is referring to server in a different cloud.

2.b.  Ensure that the AWS Security Group, or Azure NSG, or Google Cloud Platform firewall rule does not restrict traffic between the NameNode and DataNode.  For AWS Security Group configuration, the IP addresses will be internal if they are all in the same AWS VPC and governed by the same Security Group.  If NameNode is in a heterogeneous public cloud relative to the DataNode, then external IP addresses will be needed for Rules to allow the servers to communicate.

3.  The NameNode must be able to one, passwordlessly ssh to itself with hduser; two, passwordlessly ssh to the DataNode(s) with hduser.  For the second one, from the NameNode server, run these commands:

su hduser
cat /home/hduser/.ssh/id_rsa.pub

# If the above is not found, you may need to run this: ssh-keygen -t rsa -P ""  # press enter to the next prompt

4.  Copy the content from the last command above and do on each DataNode server, append it to /home/hduser/.ssh/authorized_keys.  After you save the file, ensure that this file /home/hduser/.ssh/authorized_keys has rw------- permissions.  Use chmod 600 authorized_keys if necessary.

5.  On the Hadoop NameNode server, modify this file /usr/local/hadoop/etc/hadoop/workers
Add a new line with the domain name of each DataNode server.  It will look like this:

coolnameofserver

Remember to save the file.

6.   Modify the core-site.xml file (e.g., often found at /usr/local/hadoop/etc/hadoop/core-site.xml). Find the <property> tag and the tag inside it named  <value>.  Change the "localhost" to the domain name of the NameNode server itself.

7.  Modify /usr/local/hadoop/etc/hadoop/mapred-site.xml

In between the <configuration> and </configuration> tags, place this text with "NameNode" being replaced by the domain name of your NameNode server:

<property>
 <name>mapred.job.tracker</name>
  <value>NameNode:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

8.  Does the hdfs-site.xml (e.g., /usr/local/hadoop/etc/hadoop/hdfs-site.xml) have a dfs.replication value element that is correct?  The value is computed by adding the number of NameNode and DataNode servers that will be in the Hadoop cluster once the new node has been added.

Find the hdfs-site.xml file (e.g., /usr/local/hadoop/etc/hadoop/hdfs-site.xml).  Inside it find the "dfs.replication" element.  Underneath it will be a <value> number. Modify it to be "2" (with no quotes) if you are configuring one NameNode and one DataNode Hadoop server.  The number should be equivalent to total number of nodes that are your NameNode and DataNode servers in the Hadoop cluster you are configuring. 

If you find no such element, copy this text inside the <configuration> and </configuration> tags (but replace the "2" with the sum of your NameNode and DataNodes in your desired Hadoop cluster):

<property>
  <name>dfs.replication</name>
  <value>2</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

9.  Repeat steps 7 through 9 for each remaining DateNode server. 

10.  Start the multi-node instance. Log into the NameNode as hduser (e.g., su hduser).  Then run this command:  
bash /usr/local/hadoop/sbin/start-all.sh

Tips for deploying a multi-node cluster of Hadoop with both AWS and Azure
By default you will not be able to ping Azure instances.  But this does not mean SSH will not work.  ICMP packets are turned off by default in Azure.  

The datanodes can be in AWS as long as the AWS instances are governed by a Security Group which allows inbound connections originating from the Azure Hadoop NameNode.  To find its external IP address, go to the back end of the NameNode and run "curl http://icanhazip.com" (with no quotes).  The other requirement is that the Network Security Group in Azure for the Hadoop NameNode must allow communication from the AWS datanodes.

Tips for cluster deployments to hybrid clouds involving both AWS and Google Cloud Platform
The datanodes' DataNode service (as shown with the jps command) may be controlled by the NameNode, but the node may not be in the cluster unless the NameNode accepts connectivity over the port configured in the core-site.xml file.

You may want to see this posting for various troubleshooting when adding a node to the cluster.

How Do You Run a MapReduce Job (with Python)?

Problem scenario
You have studied what a MapReduce job is conceptually.  But you want to try it out.  You have a Linux server with the Hadoop namenode (aka master node) installed on it.  How do you run a MapReduce job [to understand what it is all about]?

Solution
This is just an example.  We tried to make this as simple as possible.  This assumes that Python has been installed on the Hadoop namenode (aka master node) running on Linux.

1.  Deploy Hadoop if it has not been installed.  See this link to deploy Hadoop to a single server.  If you want to set up a multi-node cluster of Hadoop, see these directions. These directions can work with any distribution of Linux, but we found the with CentOS, it was unreliable. (With CentOS we saw ResourceManager stopping inexplicably, excessive CPU consumption, or some other undefined problem that makes the MR jobs take too long. We do not know why because CentOS is a great distribution of Linux.)

2.  Log into the Hadoop namenode (aka master), and then run this Bash program (to get files ready and prepare two Python programs).

#!/bin/bash
mkdir /tmp/contint/

curl http://www.gutenberg.org/files/2701/2701.txt > /tmp/contint/2701.txt
curl http://www.gutenberg.org/files/2694/2694.txt > /tmp/contint/2694.txt
curl https://www.gutenberg.org/files/74/74-0.txt > /tmp/contint/74-0.txt

echo "#!/usr/bin/env python3
# This file is mapper.py
# This was taken from https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
# There were slight modifications to it.

import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
        # remove leading and trailing whitespace
        line = line.strip()
        # split the line into words
        words = line.split()
        # increase counters
        for word in words:
                # write the results to STDOUT (standard output);
                # what we output here will be the input for the
                # Reduce step, i.e. the input for reducer.py
                #
                # tab-delimited; the trivial word count is 1
                print ('%s\t%s' % (word, 1))
" > /tmp/mapper.py

chmod +x /tmp/mapper.py

echo "#!/usr/bin/env python3
# This file is reducer.py
# This was mostly taken from https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None #this is a test

# input comes from STDIN
for line in sys.stdin:
        # remove leading and trailing whitespace
        line = line.strip()

        # parse the input we got from mapper.py
        word, count = line.split('\t', 1)

        # convert count (currently a string) to int
        try:
                count = int(count)
        except ValueError:
                # count was not a number, so silently
                # ignore/discard this line
                continue

                # this IF-switch only works because Hadoop sorts map output
                # by key (here: word) before it is passed to the reducer
        if current_word == word:
                current_count += count
        else:
                if current_word:
                        # write result to STDOUT
                        print('%s\t%s' % (current_word, current_count))
                        current_count = count
                        current_word = word

# do not forget to output the last word if needed!
if current_word == word:
        print('%s\t%s' % (current_word, current_count))
" > /tmp/reducer.py

chmod +x /tmp/reducer.py

echo "Script finished."
echo "***********************************************************************************"
echo ""
# Scripting the below is problematic on Ubuntu/Debian if the script is run via a "sudo" command.
# Plus the below is just an example. Some people may want to call the directory something else.
echo " Run these commands (or commands like these) to prepare a directory to run the hadoop commands:

su hduser
cd ~
mkdir placetowork
cp /tmp/reducer.py placetowork/
cp /tmp/mapper.py placetowork/
chown -R hduser:hadoop placetowork

# Then run these commands manually as the hduser in the placetowork subdirectory:
bash /usr/local/hadoop/sbin/start-dfs.sh
bash /usr/local/hadoop/sbin/start-yarn.sh
# run the above two commands again just in case. (sometimes it is necessary; you could use "jps" to verify if they need to be run once again)
bash /usr/local/hadoop/sbin/start-dfs.sh
bash /usr/local/hadoop/sbin/start-yarn.sh

hdfs dfs -mkdir -p /usr/hduser/contint
hdfs dfs -copyFromLocal /tmp/contint/* /usr/hduser/

# The command below maybe needs to use /home/hduser instead of the /user/hduser
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.0.jar -file /tmp/mapper.py -mapper /tmp/mapper.py -file /tmp/reducer.py -reducer /tmp/reducer.py -input /usr/hduser/* -output /usr/hduser/contint-output -verbose
*************************************************************************************
"

3.  Find the hadoop-streaming*.jar file.  Run this command if you need to find it:

sudo find / -name hadoop-streaming-*.jar

4.  Optional step.  Run this command after your replace "/user/hduser" with the directory in your hdfs that you will write to:   hdfs dfs -ls /user/hduser

# This just tests you are able to use hdfs with the user you are logged in as.

5.  Run the command below after you replace "/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.0.jar" with the path and file name of the hadoop-streaming*.jar file you have and after you replace "/user/hduser/" with the HDFS path your user has permissions to write to:

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.0.jar -file /tmp/mapper.py -mapper /tmp/mapper.py -file /tmp/reducer.py -reducer /tmp/reducer.py -input /user/hduser/* -output /user/hduser/contint-output -verbose

6. Optional step. You can run this command to see the status of your job:

mapred job -list all
# if you have an old version of hadoop, try this command: hadoop job -list all

FYI
The above directions were tested on Ubuntu 20.x, SUSE 15.x and CentOS 8.x Linux servers (as well as other types including different versions of RHEL). With CentOS we saw some unreliable problems. We do think CentOS is a great OS however.

How Do You Get All of Your Live Nodes to Appear in The “hdfs dfsadmin -report” Results?

Problem scenario
You use the "hdfs dfsadmin -report" command on your Hadoop NameNode server.  You see that there is a small amount of "DFS Remaining."  You expect many more GB. You also see one datanode not listed under the "Live datanodes" section. All of your live datanodes are not appearing in hdfs,

Your /usr/local/hadoop/etc/hadoop/slaves file has the DNS name of a server you expect to be a datanode.  You see no reference to this data node "hdfs dfsadmin -report" output.

When you run start-dfs.sh from the NameNode server you see that the DataNode service starts up on a slave server in a multi-node Hadoop cluster.  Likewise when you run stop-dfs.sh you see the DataNode service disappear from the back-end of the slave server itself.  The NameNode (or Master Hadoop server) controls the datanode service perfectly on a specific slave server.

But when you run "sudo /usr/local/hadoop/bin/hdfs dfsadmin -report" (or hdfs dfsadmin -report) you see only one live datanode and it is NOT the one described above.  You have at least two data nodes.  What can be done to get this other node to appear in the report?  You want this additional datanode to be live.  You know the start-dfs.sh and stop-dfs.sh scripts control the server well.  But you do not know why the dfsadmin report will monitor this server.

What can you do to add another datanode and have it participate in the hdfs cluster or otherwise get the report to accurately show which data nodes you have running?

Solution
Prerequisite
This assumes you have a multi-node deployment of Hadoop.  If you do not know how to deploy this, see these directions.

Root cause
An intermediate firewall is blocking certain ports but not other ports between the datanode in question and the namenode.  If every port was blocked, the DataNode service would not start and stop from the MasterNode operations (e.g., running the start-dfs.sh and stop-dfs.sh scripts).  Some ports would have to be open.  The problem of not having the datanode listed as "live" when you run "hdfs dfsadmin -report" on the namenode with corresponding functionality is attributable to ports 54310 and 50010 being blocked.

Procedure of Solution
From the datanode server, ensure that outbound ports 54310 and 50010 are unblocked to the name node server (aka master server).

To the namenode server, ensure that inbound ports 54310 and 50010 are unblocked from the datanode server (aka master server).

You may need to run stop-dfs.sh and start-dfs.sh again once the ports are opened. Commands like these may help you "sudo bash /usr/local/hadoop/sbin/stop-dfs.sh && sudo bash /usr/local/hadoop/sbin/start-dfs.sh".

Miscellaneous
The nmap utility is useful for checking out this problem closely.

If you have the NameNode running in Azure, you would need to manually add an inbound security rule in the relevant Network Security Group.  In our experience opening the security rule for any Source Port and any Destination Port makes the configuration much more simple.

How Do You Know If Yarn or Hadoop’s NameNode Services Are Running?

Problem scenario
How do you know if YARN or Hadoop's NameNode services are running or not?

Solution
Use the "sudo jps" command.  Some Linux users may not have sufficient permissions to install it.  

If the command is not found, go to the option below for your distribution of Linux:

If you are running a RedHat derivative (e.g., Fedora, CentOS etc.), run this:  sudo yum -y install java-1.8.0-openjdk-devel
# you may prefer to use java-1.7.0-openjdk-devel

If you are running Ubuntu or a Debian distribution, run this:  sudo apt-get -y install apt-get install openjdk-8-jre
# you may prefer to use openjdk-7-jre

To interpret the output of a "sudo jps" command, here is an example of jps output where the NameNode is running but Yarn is not:

7521 SecondaryNameNode
7189 NameNode
7335 DataNode
7695 Jps

Here is an example of Yarn running but not the name node:

9027 GetConf
8004 NodeManager
7863 ResourceManager
9079 Jps

How Do You Troubleshoot the Maven Build Failure of No “POM in this directory”?

Problem scenario
You need to use this following command on a CentOS/RedHat/Fedora Linux server in a /home/hadoop/ directory:

mvn package -Pdist,native -DskipTests -Dtar

The Maven build is failing.  When you run the command with the "-e" flag for verbosity you see this:

[INFO] BUILD FAILURE
...
[WARNING] The requested profile "dist" could not be activated because it does not exist.
[WARNING] The requested profile "native" could not be activated because it does not exist.
[ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/hadoop).

Please verify you invoked Maven from the correct directory. -> [Help 1]
...
org.apache.maven.lifecycle.MissingProjectException: The goal you specified requires a project to execute but there is no POM in this directory (/home/hadoop). Please verify you invoked Maven from the correct directory.
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:84)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

What do you do to create a pom.xml file in the correct directory so you can run the original mvn command?

Solution
Run these commands:
cd /home/hadoop/

# In the commands below, replace "contint" with the name of your company or some other term you desire.
mvn archetype:generate -DgroupId=com.contint.app -DartifactId=contint -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

cp ./contint/pom.xml pom.xml

mvn package -Pdist,native -DskipTests -Dtar

How Do You Troubleshoot an hdfs Command with the Error “java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)…ConnectionRefused”?

Problem scenario
When trying to run an hdfs command you see this message:

"WARN ipc.Client: Failed to connect to server: localhost/127.0.0.1:54310: try once and fail.
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:681)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:777)
        at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1542)
        at org.apache.hadoop.ipc.Client.call(Client.java:1373)
        at org.apache.hadoop.ipc.Client.call(Client.java:1337)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
        at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700)
        at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436)
        at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1433)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
        at org.apache.hadoop.fs.Globber.doGlob(Globber.java:269)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1685)
        at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
        at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
        at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
        at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:378)
ls: Call From contint/10.10.10.10 to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused"

What do you do?

Solution

See if the regular NameNode (not the SecondaryNameNode) is running.  Run this (as the hduser or something similar):  jps

You should see "NameNode" in the results.  If NameNode is not running, then you need to start it.

To start NameNode run this command:  bash /usr/local/hadoop/sbin/start-dfs.sh

If start-dfs.sh completes without errors but the problem remains, see one of these links: onetwo, or three.

If start-dfs.sh is having issues and running a single-node cluster, can you SSH into the local host with root?  Can you SSH to the localhost with root subsequently?  If you cannot login with SSH using the root user twice consecutively, see if getting that to work will allow start-dfs.sh to work.

If you can do both of these without any hdfs commands, then the links above should be what you try next:

ssh root@localhost
ssh root@localhost 

How Do You Pass Parameters Using Boto?

Problem scenario
You want to run a Boto program. But there is an error about necessary parameters not being assigned.

You see a message like this:

botocore.exceptions.ClientError: An error occurred (ValidationError) when calling the CreateStack operation: Parameters: [KeyName, DBPassword, DBUser] must have values

Where can you see an example of parameters being passed?

Solution
Updated on 3/15/21.
Here is a working program:

import boto3
cloudf = boto3.client('cloudformation', region_name='us-west-1', aws_access_key_id='AKIAabcdefgh12345', aws_secret_access_key='SOMETHING1234here/moretext')
json_stack=open('lampstack.json', 'r').read()
cloudf.create_stack(StackName='continualstack', TemplateBody=json_stack, Parameters = [ { 'ParameterKey': 'KeyName', 'ParameterValue': 'foobar'}, {'ParameterKey': 'DBPassword', 'ParameterValue': 'verycont'}, {'ParameterKey': 'DBUser', 'ParameterValue': 'contuser'}, {'ParameterKey': 'DBRootPassword', 'ParameterValue': 'contpassword'}] )

How Do You Copy a File into HDFS without the Error “No such file or directory”?

Problem scenario
You want to add a file to Hadoop. You are trying to run a basic Hadoop command to copy a file into HDFS.  You get this error:  copyFromLocal: `hdfs://localhost:54310/user/...': No such file or directory

How do you copy a file from your OS into HDFS?

Solution
Do one of the following:
Option 1.  Run this command to create a new directory (substitute "jdoe" with the name of your user):

hdfs dfs -mkdir -p /user/jdoe/contint
# Now repeat your copy command

Option 2.  Use a directory you know exists for the destination file without renaming the file itself (substitute "/path/to/" with the directory path to the source file and substitute "foo.bar" with the file name you want copied):

hdfs dfs -copyFromLocal /path/to/foo.bar hdfs://localhost:54310/user/root/

It does not work like the cp command in Linux where the file name can be changed to in the destination path with the -copyFromLocal flag.  The destination file name should not appear in the destination path argument.

How Do You Install kubectl, kubeadm, and kubelet on a CentOS/RHEL/Fedora Server?

Problem scenario
You have a Red Hat derivative distribution of Linux.  You want to use some core Kubernetes utilities.  How do you install kubectl, kubeadm, and kubelet on Linux?

Solution
1.  Create a file /etc/yum.repos.d/kubernetes.repo2.  Have this be the content:

[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

2.  Run this: sudo yum -y install kubelet kubeadm kubectl