How to Install Hadoop on an AWS Instance of RedHat Linux or an Azure Instance of CentOS

Updated 1/5/18

THESE DIRECTIONS ARE OUTDATED.  They are here as a reference for legacy purposes only.  For directions on how to install Hadoop on a RedHat or CentOS server, see this article.

Problem scenario

You want to install an open source version on a RedHat derivative distribution of Linux in a public cloud.  How do you do this?

Solution
These directions will allow you install Hadoop on a RedHat derivative (e.g., CentOS or RHEL) in a public cloud.  These directions are for a single node deployment.  (For a multi-node deployment, see these directions.) They include a script, directions how to run the script, and other necessary commands.  These directions have been tested to work on RedHat 7.4 in AWS and CentOS 6.8 in Azure.  The script was designed to install the open source version of Hadoop 2.8.1. 

1.  Log into CentOS or RHEL.    
2.  Run these interactively as they are not easily or safely scripted:

sudo adduser hadoop
sudo passwd hadoop  #respond with the password of your choice
ssh-keygen -t rsa -P "" # press enter to the prompt
sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
sudo chmod 0600 ~/.ssh/authorized_keys
ssh 127.0.0.1   # respond with 'y' to accept the fingerprint
cd /tmp

3.  Create a script with the content below from "#!/bin/bash" to "Proceed with the manual steps".  Beware that this script could overwrite some files.  This script was intended for a new OS with no data or special configuration on it.

#!/bin/bash
# Written by continualintegration.com
yum install -y java-1.7.0-openjdk-devel wget
cd /home/hadoop
wget http://apache.claz.org/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
tar xzf hadoop-2.8.1.tar.gz
mv hadoop-2.8.1 hadoop
echo '
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin' >> ~/.bashrc
source ~/.bashrc
sed -i '/export JAVA_HOME/c\export JAVA_HOME=/usr/.' /home/hadoop/hadoop/etc/hadoop/hadoop-env.sh
echo '<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri has a scheme that determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri has authority that is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
' > /home/hadoop/hadoop/etc/hadoop/core-site.xml
echo "Proceed with the manual steps"

4.  sudo bash nameOfscript.sh  # You must run script above as a sudoer or the root user.

5.  To get all the hdfs commands to work, one way would be to allow root to ssh to the local server and run "sudo bash" to kick off scripts to start the Hadoop daemons.  There are other ways, but here is how to do it with a reliance on the sudo user:

sudo su -
vi /etc/ssh/sshd_config
 #To allow root user to log in

#For Centos with Azure, change the "PermitRootLogin no" to "PermitRoot Login yes"

#For RHEL with AWS, change the "#PermitRootLogin yes" to "PermitRoot Login yes"

service sshd stop
service sshd start

ssh-keygen -t rsa -P "" #press enter to the default prompt.  

If you are using RedHat in AWS, use this command:
cat /home/ec2-user/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

If you are using CentOS in Azure, use this command (but substitute centos with the username):
cat /home/centos/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

Run these regardless of what OS or public cloud provider you are using:
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

ssh root@localhost
ssh root@localhost 
#do it again
exit # the second SSH session
exit # the first SSH session

6.  Complete the configuration and start the NameNode service while logged in as root:

hdfs namenode -format
bash /home/hadoop/hadoop/sbin/start-dfs.sh


THESE DIRECTIONS ARE OUTDATED.  They are here as a reference for legacy purposes only.  For directions on how to install Hadoop on a RedHat or CentOS server, see this article.

Leave a comment

Your email address will not be published. Required fields are marked *