How Do You Install Apache Spark on Any Type of Linux?

Problem scenario
You want a generic script that can install open source Apache Spark on Debian/Ubuntu, CentOS/RedHat/Fedora or SUSE distributions of Linux.  How do you do this?

Solution
1.  Create a script such as this in /tmp/ (e.g., /tmp/spark.sh).

#!/bin/bash
# Written by www.continualintegration.com

sparkversion=2.2.1  # Change this version as necessary

distro=$(cat /etc/*-release | grep NAME)

debflag=$(echo $distro | grep -i "ubuntu")
if [ -z "$debflag" ]
then   # If it is not Ubuntu, test if it is Debian.
  debflag=$(echo $distro | grep -i "debian")
  echo "determining Linux distribution..."
else
   echo "You have Ubuntu Linux!"
fi

rhflag=$(echo $distro | grep -i "red*hat")
if [ -z "$rhflag" ]
then   #If it is not RedHat, see if it is CentOS or Fedora.
  rhflag=$(echo $distro | grep -i "centos")
  if [ -z "$rhflag" ]
    then    #If it is neither RedHat nor CentOS, see if it is Fedora.
    echo "It does not appear to be CentOS or RHEL..."
    rhflag=$(echo $distro | grep -i "fedora")
    fi
fi

if [ -z "$rhflag" ]
  then
  echo "...still determining Linux distribution..."
else
  echo "You have a RedHat distribution (e.g., CentOS, RHEL, or Fedora)"
  yum -y install java-1.8.0-openjdk*
  JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
  echo 'export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk' >> ~/.bashrc
  source ~/.bashrc
  source /etc/environment
fi

if [ -z "$debflag" ]
then
  echo "...still determining Linux distribution..."
else
   echo "You are using either Ubuntu Linux or Debian Linux."
   apt-get -y update # This is necessary on new AWS Ubuntu servers.
   apt -y install openjdk-8-jre-headless
fi

suseflag=$(echo $distro | grep -i "suse")
if [ -z "$suseflag" ]
then
  if [ -z "$debflag" ]
  then
    if [ -z "$rhflag" ]
      then
      echo "*******************************************"
      echo "Could not determine the Linux distribution!"
      echo "Installation aborted. Nothing was done."
      echo "******************************************"
      exit
    fi
  fi
else
   zypper -n install java-1_8_0-openjdk
fi

curl http://ftp.wayne.edu/apache/spark/spark-$sparkversion/spark-$sparkversion-bin-hadoop2.7.tgz > /bin/spark-$sparkversion-bin-hadoop2.7.tgz

cd /bin
tar -zxvf spark-$sparkversion-bin-hadoop2.7.tgz
mkdir -p /usr/local/spark
mv /bin/spark-$sparkversion-bin-hadoop2.7/* /usr/local/spark

echo 'PATH="$PATH":/usr/local/spark/bin/' > /etc/profile.d/spark.sh

echo 'To test the installation, run "/usr/local/spark/bin/spark-shell"'

2.  Run it like this:

sudo bash /tmp/spark.sh

Leave a comment

Your email address will not be published. Required fields are marked *