Announcement: Big Data Quiz Now Available!

We worked very hard to write an original twelve-question Big Data Quiz.  Please do not expect another posting (like we normally post) for several days.  Please be sure to check out the Big Data Quiz!

 » Read more..

How Do You Troubleshoot the Spark-Shell Error “A JNI error has occurred”?

Problem scenario
You run spark-shell in a Debian distribution of Linux (e.g., Ubuntu) but you receive this error:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread “main” java.lang.ArrayIndexOutOfBoundsException: 64
        at java.util.jar.JarFile.match(java.base@9-internal/
        at java.util.jar.JarFile.checkForSpecialAttributes(java.base@9-internal/
        at java.util.jar.JarFile.isMultiRelease(java.base@9-internal/
        at java.util.jar.JarFile.getEntry(java.base@9-internal/
        at java.util.jar.JarFile.getJarEntry(java.base@9-internal/
at jdk.internal.util.jar.JarIndex.getJarIndex(java.base@9-internal/
        at jdk.internal.loader.URLClassPath$JarLoader$
        at jdk.internal.loader.URLClassPath$JarLoader$
        at Method)
        at jdk.internal.loader.URLClassPath$JarLoader.ensureOpen(java.base@9-internal/
        at jdk.internal.loader.URLClassPath$JarLoader.<init(java.base@9-internal/
        at jdk.internal.loader.URLClassPath$
        at jdk.internal.loader.URLClassPath$
        at Method)
        at jdk.internal.loader.URLClassPath.getLoader(java.base@9-internal/
        at jdk.internal.loader.URLClassPath.getLoader(java.base@9-internal/
        at jdk.internal.loader.URLClassPath.getResource(java.base@9-internal/
        at jdk.internal.loader.BuiltinClassLoader$
        at jdk.internal.loader.BuiltinClassLoader$
        at Method)
        at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@9-internal/
        at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@9-internal/
        at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@9-internal/
        at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@9-internal/
        at java.lang.ClassLoader.loadClass(java.base@9-internal/
        at sun.launcher.LauncherHelper.loadMainClass(java.base@9-internal/
        at sun.launcher.LauncherHelper.checkAndLoadMain(java.base@9-internal/

How do you solve this?

 » Read more..

How Do You Install Apache Spark on Any Type of Linux?

Problem scenario
You want a generic script that can install open source Apache Spark on Debian/Ubuntu, CentOS/RedHat/Fedora or SUSE distributions of Linux.  How do you do this?

1.  Create a script such as this in /tmp/ (e.g., /tmp/

# Written by

sparkversion=2.2.1  # Change this version as necessary

distro=$(cat /etc/*-release | grep NAME)

debflag=$(echo $distro | grep -i “ubuntu”)
if [ -z “$debflag” ]

 » Read more..

What Are the Different Acronym Stacks in I.T.?

What are the different acronym stacks in I.T.?

There are many open source combinations of technologies that are in wide use.  These acronyms referred to as “full stacks” or “stacks” appear in articles and job descriptions.  A full stack is bundle of software that (includes an OS and) can create a complete and functional product when properly configured. 

 » Read more..

How Do You Know If Apache Spark Has Been Installed?

Problem scenario
You are looking for the Hadoop components’ versions.  You run these commands:

hadoop version
hdfs version
yarn version

You notice the output is the same for each of the three commands above.  You are not sure if Apache Spark has been installed.  What do you do?

Run this command:

spark-submit –version

 » Read more..

How Do You Connect to Your Apache Spark Deployment in AWS?

Problem scenario
You have recently deployed Apache Spark to AWS.  You see the EC-2 instances were created.  But you cannot access them over the web UI (even over ports 4140, 8088, or 50070).  You cannot access the instances via Putty.  You changed your normal Security Group to allow TCP communication from your work station’s IP address.  What should you do to connect to your new Spark instance for the first time?

 » Read more..

How Do You Deploy an Apache Spark Cluster in AWS?

Problem scenario
You want to deploy Apache Spark to AWS.  How do you do this?

1.  Log into the AWS management console.  Once in, go to this link.

2.  Click “Create cluster” and then “Quick Create”

3. For Software Configuration, choose “Spark:…”

4. For “Security and access”, for EC2 key pair, choose the key pair you desire.

 » Read more..

A List of Apache Spark Books

99 Apache Spark Interview Questions for Professionals by Yogesh Kumar
Advanced Analytics with Spark: Patterns for Learning from Data at Scale by Juliet Hougland, Uri Laserson, Sean Owen, Sandy Ryza and Josh Wills
Apache Spark 2 for Beginners by Rajanarayanan Thottuvaikkatumana
Apache Spark in 24 Hours, Sams Teach Yourself by Jeffrey Aven
Apache Spark for Data Science Cookbook by Padma Priya Chitturi
Apache Spark for Java Developers by Sumit Kumar and Sourav Gulati
Apache Spark Graph Processing by Rindra Ramamonjison
Apache Spark Interview Question &

 » Read more..