How Do You Know What Version of R Is Installed on Your Linux System?

Problem scenario
You want to determine which version of R is installed.  How do you find this out in Linux?

Solution
Run this command:
R –version

 » Read more..

How Do You Connect to Your Apache Spark Deployment in AWS?

Problem scenario
You have recently deployed Apache Spark to AWS.  You see the EC-2 instances were created.  But you cannot access them over the web UI (even over ports 4140, 8088, or 50070).  You cannot access the instances via Putty.  You changed your normal Security Group to allow TCP communication from your work station’s IP address.  What should you do to connect to your new Spark instance for the first time?

 » Read more..

How Do You Deploy an Apache Spark Cluster in AWS?

Problem scenario
You want to deploy Apache Spark to AWS.  How do you do this?

Solution
1.  Log into the AWS management console.  Once in, go to this link.

2.  Click “Create cluster” and then “Quick Create”

3. For Software Configuration, choose “Spark:…”

4. For “Security and access”, for EC2 key pair, choose the key pair you desire.

 » Read more..

A Long List of Hadoop Books

Advanced Analytics with Spark: Patterns for Learning from Data at Scale by Sandy Ryza, Uri Laserson, Sean Owen and  Josh Wills
Agile Data Science: Building Data Analytics Applications with Hadoop by Russell Jurney
Apache Drill: The SQL query engine for Hadoop and NoSQL by Ted Dunning, Ellen Friedman, Tomer Shiran and Jacques Nadeau
Apache Flume: Distributed Log Collection for Hadoop -Second Edition by Steve Hoffman
Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 (Addison-Wesley Data & Analytics) by Arun Murthy, Vinod Vavilapalli, Douglas Eadline, Joseph Niemiec and Jeff Markham

A List of Apache Spark Books

99 Apache Spark Interview Questions for Professionals by Yogesh Kumar
Advanced Analytics with Spark: Patterns for Learning from Data at Scale by Juliet Hougland, Uri Laserson, Sean Owen, Sandy Ryza and Josh Wills
Apache Spark 2 for Beginners by Rajanarayanan Thottuvaikkatumana
Apache Spark in 24 Hours, Sams Teach Yourself by Jeffrey Aven
Apache Spark for Data Science Cookbook by Padma Priya Chitturi
Apache Spark for Java Developers by Sumit Kumar and Sourav Gulati
Apache Spark Graph Processing by Rindra Ramamonjison
Apache Spark Interview Question &

 » Read more..

How Do You Configure Linux To Be Ready for Cloudera 5 (Hadoop)?

Two problem scenarios and solutions.

Problem scenario:  You installed Cloudera 5 (Hadoop) on CentOS 7.2 for the first time. You tried to start the Cloudera database service. But you get an error that it failed.

You run this: systemctl status cloudera-scm-server-db.service

The results include this fragment:  ”  … Failed to start LSB: Cloudera SCM Server’s Embedded DB.”

How do you start the Cloudera DB service?

 » Read more..

What is TCP port 8080 typically used for?

Question
What is TCP port 8080 typically used for?

Answer
Jenkins, Docker, NodeJS, Apache Ambari, Apache Marathon, Apache Tomcat, Amazon Web Services’ Elastic Load Balancer, JBoss Application Server, GitLab, M2MLogger (remote monitoring), InfoSphere BigInsights Console (IBM’s proprietary Hadoop and Spark solution), JasperReports (because of Apache Tomcat), remote management of physical routers, and enterprise network proxy services all commonly use port 8080. 

In part, taken from Learning AWS

 » Read more..

How do you install Hadoop on Linux SUSE?

Updated on 1/1/18
You may want to see this different posting that works for Linux SUSE as well as other non-SUSE distributions of Linux.

Problem scenario
You want to install the open source version of Hadoop 3.0.0 on Linux SUSE 12 SP3.  What do you do?

Solution
These directions would work on a non-SUSE distribution if you install Java a different way from the first command in the script. 

 » Read more..

How to Install Hadoop on an AWS Instance of RedHat Linux or an Azure Instance of CentOS

Updated 1/5/18

THESE DIRECTIONS ARE OUTDATED.  They are here as a reference for legacy purposes only.  For directions on how to install Hadoop on a RedHat or CentOS server, see this article.

Problem scenario
You want to install an open source version on a RedHat derivative distribution of Linux in a public cloud.  How do you do this?

Solution
These directions will allow you install Hadoop on a RedHat derivative (e.g.,

 » Read more..

How To Get Hadoop Installed on Ubuntu When There Is a Java Error

Problem scenario:  On Ubuntu, after installing Hadoop, when you enter the command ‘/usr/local/hadoop/bin/hadoop namenode -format’ and you get this error:

“Error: JAVA_HOME is not set and could not be found.”

Solution:  Verify Java is installed (‘java -version’).  If it is not installed, you can use ‘apt-get install -y default-jre’.

Next, go to hadoop-env.sh.  Find the export JAVA_HOME stanza.  Change the ‘${JAVA_HOME}’ value to ‘/usr’ with no quotes.

 » Read more..