Big Data – Page 7 – CONTINUAL INTEGRATION

Announcement: Big Data Quiz Now Available!

03/12/201806/03/2019 0 Comments

We worked very hard to write an original twelve-question Big Data Quiz. Please do not expect another posting (like we normally post) for several days. Please be sure to check out the Big Data Quiz!

…

Continue reading “Announcement: Big Data Quiz Now Available!”

How Do You Troubleshoot the Spark-Shell Error “A JNI error has occurred”?

03/10/201810/11/2020 0 Comments

Problem scenario
You run spark-shell in a Debian distribution of Linux (e.g., Ubuntu) but you receive this error:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread “main” java.lang.ArrayIndexOutOfBoundsException: 64
        at java.util.jar.JarFile.match(java.base@9-internal/JarFile.java:983)
        at java.util.jar.JarFile.checkForSpecialAttributes(java.base@9-internal/JarFile.java:1017)
        at java.util.jar.JarFile.isMultiRelease(java.base@9-internal/JarFile.java:399)
        at java.util.jar.JarFile.getEntry(java.base@9-internal/JarFile.java:524)
        at java.util.jar.JarFile.getJarEntry(java.base@9-internal/JarFile.java:480)
at jdk.internal.util.jar.JarIndex.getJarIndex(java.base@9-internal/JarIndex.java:114)
        at jdk.internal.loader.URLClassPath$JarLoader$1.run(java.base@9-internal/URLClassPath.java:640)
        at jdk.internal.loader.URLClassPath$JarLoader$1.run(java.base@9-internal/URLClassPath.java:632)
        at java.security.AccessController.doPrivileged(java.base@9-internal/Native Method)
        at jdk.internal.loader.URLClassPath$JarLoader.ensureOpen(java.base@9-internal/URLClassPath.java:631)
        at jdk.internal.loader.URLClassPath$JarLoader.<init(java.base@9-internal/URLClassPath.java:606)
        at jdk.internal.loader.URLClassPath$3.run(java.base@9-internal/URLClassPath.java:386)
        at jdk.internal.loader.URLClassPath$3.run(java.base@9-internal/URLClassPath.java:376)
        at java.security.AccessController.doPrivileged(java.base@9-internal/Native Method)
        at jdk.internal.loader.URLClassPath.getLoader(java.base@9-internal/URLClassPath.java:375)
        at jdk.internal.loader.URLClassPath.getLoader(java.base@9-internal/URLClassPath.java:352)
        at jdk.internal.loader.URLClassPath.getResource(java.base@9-internal/URLClassPath.java:218)
        at jdk.internal.loader.BuiltinClassLoader$3.run(java.base@9-internal/BuiltinClassLoader.java:463)
        at jdk.internal.loader.BuiltinClassLoader$3.run(java.base@9-internal/BuiltinClassLoader.java:460)
        at java.security.AccessController.doPrivileged(java.base@9-internal/Native Method)
        at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(java.base@9-internal/BuiltinClassLoader.java:459)
        at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(java.base@9-internal/BuiltinClassLoader.java:406)
        at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base@9-internal/BuiltinClassLoader.java:364)
        at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base@9-internal/ClassLoaders.java:184)
        at java.lang.ClassLoader.loadClass(java.base@9-internal/ClassLoader.java:419)
        at sun.launcher.LauncherHelper.loadMainClass(java.base@9-internal/LauncherHelper.java:585)
        at sun.launcher.LauncherHelper.checkAndLoadMain(java.base@9-internal/LauncherHelper.java:497)

How do you solve this?

…

Continue reading “How Do You Troubleshoot the Spark-Shell Error “A JNI error has occurred”?”

How Do You Install Apache Spark on Any Type of Linux?

03/09/201805/23/2019 0 Comments

Problem scenario
You want a generic script that can install open source Apache Spark on Debian/Ubuntu, CentOS/RedHat/Fedora or SUSE distributions of Linux. How do you do this?

Solution
1. Create a script such as this in /tmp/ (e.g., /tmp/spark.sh).

#!/bin/bash
# Written by www.continualintegration.com

sparkversion=2.2.1 # Change this version as necessary

distro=$(cat /etc/*-release | grep NAME)

debflag=$(echo $distro | grep -i “ubuntu”)
if [ -z “$debflag” ]
then

…

Continue reading “How Do You Install Apache Spark on Any Type of Linux?”

How Do You Install Apache Solr on Any Type of Linux?

03/07/201804/28/2020 0 Comments

Problem scenario
You want a generic script that can install open source Apache Solr on Debian/Ubuntu, CentOS/RedHat/Fedora or SUSE distributions of Linux. How do you do this with the same bash script?

Solution
1. Create a script such as this in /tmp/ (e.g., /tmp/solr.sh).

#!/bin/bash
# Written by www.continualintegration.com

solrversion=7.2.1 # Change this version as necessary

distro=$(cat /etc/*-release | grep NAME)

debflag=$(echo $distro | grep -i “ubuntu”)
if [ -z “$debflag” ]
then

…

Continue reading “How Do You Install Apache Solr on Any Type of Linux?”

How Do You Troubleshoot an Empty Multi-node Hadoop Cluster?

03/02/201801/01/2021 0 Comments

Problem scenario
One or more of the following is happening:
   1) There are 0 DataNodes in your Hadoop cluster according to an error message
   2) There is 0 B configured as capacity (as shown from a “hdfs dfsadmin -report” command).
   3) There is one fewer DataNode in your Hadoop cluster than you expect.
   4) You run “hdfs dfsadmin -report | grep Hostname” and do not see a node that has its DataNode service (as seen with the

…

Continue reading “How Do You Troubleshoot an Empty Multi-node Hadoop Cluster?”

How Do You Add a New Node to a Hadoop Cluster?

01/31/201806/03/2019 0 Comments

Problem scenario
You have a multi-node cluster of Hadoop. You want to add a new data node. What do you do?

Solution
1. a) Log into the server that will be the new DataNode. Do these things until you get to step 2.

b) Install Hadoop on the new DataNode. If you do not know how, see this posting.

…

Continue reading “How Do You Add a New Node to a Hadoop Cluster?”

What Are the Different Acronym Stacks in I.T.?

01/09/201804/28/2020 0 Comments

Question
What are the different acronym stacks in I.T.?

Answer
There are many open source combinations of technologies that are in wide use. These acronyms referred to as “full stacks” or “stacks” appear in articles and job descriptions. A full stack is a bundle of software that (includes an OS and) can create a complete and functional product when properly configured.

…

Continue reading “What Are the Different Acronym Stacks in I.T.?”

How Do You Install Hadoop with a Script for Any Type of Linux Server?

01/05/201802/06/2022 0 Comments

Updated on 1/6/21

Problem scenario
You want to install open source Hadoop. You may want a single-node or multi-node deployment with CentOS/RedHat/Fedora, Debian/Ubuntu, and/or SUSE Linux distributions. You want to have most of it scripted and have the same script work on any variety of Linux. How do you install Hadoop quickly with a script that works on almost any type of Linux?

Solution
1.

…

Continue reading “How Do You Install Hadoop with a Script for Any Type of Linux Server?”

What is Apache Parquet?

11/30/201702/10/2023 0 Comments

Question
What is Apache Parquet?

Answer
Apache Parquet is columnar data representation/manipulation tool for a Hadoop ecosystem. Data in a given column is largely uniform (e.g., a long string of characters, a single character, or an integer) in that it repeats a specific type and format of data as opposed to two cells in the same row (which may be very dissimilar types of data).

…

Continue reading “What is Apache Parquet?”

How Do You Install Apache Parquet?

11/29/201706/09/2019 0 Comments

Problem scenario
You want to install Apache Parquet on the Hadoop namenode. What do you do?

Solution
Prerequisite
This assumes that you have installed Hadoop. For directions, see this posting.

Procedure
Run these commands:

sudo su –
apt-get -y install pip
pip install thriftpy
pip install snappy
exit

sudo apt-get -y install libsnappy-dev thrift-compiler

curl https://pypi.python.org/packages/74/b5/bc459aab0566fc3cf3397467922c37411ab6e3361bab9e0ca165e1089ce8/parquet-1.2.tar.gz#md5=05aacec0620ac63ecd7dd77bf7fb9fee >

…

Continue reading “How Do You Install Apache Parquet?”