Big Data – Page 2 – CONTINUAL INTEGRATION

How Do You Troubleshoot “java.io.IOException: Stream closed at java.base/java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:442)”?

02/03/202108/29/2021 0 Comments

Problem scenario
You are trying to run a Hadoop job. You get this error:
“java.io.IOException: Stream closed at java.base/java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:442)”

What should you do?

Solution
Is the “python” command recognized as such? You may need to install Python or link the python3 binary to be in a typical location where env variables would look for it (e.g., /usr/bin/python).

Here are commands that could help you:

whereis python3
sudo ln -s python3 /bin/python

If you need help installing Python,

…

Continue reading “How Do You Troubleshoot “java.io.IOException: Stream closed at java.base/java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:442)”?”

How Do You Troubleshoot a Python MapReduce Job That Returns “Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1”?

01/17/202108/29/2021 0 Comments

Problem scenario
Your Python MR (mapreduce) job is failing. You do not know why.

You may or may not get an error like this: “Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1”

Other symptoms you may experience is the MapReduce job takes a great deal of time. It seems to hang. What should you do?

Solution

Determine which map .py file you are using and which reduce .py file that you are using.

…

Continue reading “How Do You Troubleshoot a Python MapReduce Job That Returns “Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1”?”

How Do You Get Both MapReduce Jobs and hadoop CLI Commands to Work Simultaneously without Alternately Changing an XML File Before Each One?

01/10/202108/29/2021 0 Comments

Problem scenario
Hadoop is not working correctly. You can get either a mapreduce job to work or a “hadoop” CLI command to work. But neither work unless you change an .XML file in between each operation.

Map Reduce jobs, when they are failing (and the Hadoop commands are working), there may be an error like this:

2021-01-01 22:47:42,337 INFO mapreduce.Job: Task Id : attempt_1609558624072_0001_m_000003_1,

…

Continue reading “How Do You Get Both MapReduce Jobs and hadoop CLI Commands to Work Simultaneously without Alternately Changing an XML File Before Each One?”

How Do You Install and Configure Apache Camel on a Linux Server?

11/26/202008/29/2021 0 Comments

Problem scenario
You want to install and configure Apache Camel from source on any distribution of Linux. What do you do?

Solution
You cannot. But there is a way to use Apache Camel as a test.

Background: It is not something that is can be purely installed-and-configured like other applications. Camel is a framework for integrating messaging solutions or microservice architecture.

…

Continue reading “How Do You Install and Configure Apache Camel on a Linux Server?”

How Do You Get the NameNode Process to Start in Hadoop?

11/24/202008/29/2021 0 Comments

Problem scenario
You have run start-dfs.sh and start-yarn.sh. You have stopped all the Hadoop services too. When you run “jps”, the NameNode is not showing up. You have tried a variety of different troubleshooting methods (including rebooting the NameNode). The NameNode has never worked correctly. You can delete all the data in the cluster because it never really worked. What should you do?

Solution
Run this but remember it will delete all your data:

hdfs namenode -format # Warning: this command will delete all your data in Hadoop …

Continue reading “How Do You Get the NameNode Process to Start in Hadoop?”

How Do You Troubleshoot ‘Exception in thread “main” java.lang.NullPointerException org.apache.hadoop.mapreduce.tools.CLI.displayJobList’?

11/10/202008/29/2021 0 Comments

Problem scenario
When you run hadoop commands, you get an error like this:

Exception in thread “main” java.lang.NullPointerException
at org.apache.hadoop.mapreduce.tools.CLI.displayJobList(CLI.java:784)
at org.apache.hadoop.mapreduce.tools.CLI.displayJobList(CLI.java:769)
at org.apache.hadoop.mapreduce.tools.CLI.listAllJobs(CLI.java:697)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:428)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)

What should you do?

Solution
Find the mapred-site.xml file. Make sure it has these stanzas (within the configuration and /configuration tags):

<property> …

Continue reading “How Do You Troubleshoot ‘Exception in thread “main” java.lang.NullPointerException org.apache.hadoop.mapreduce.tools.CLI.displayJobList’?”

How Do You Troubleshoot the Error “java.lang.IllegalArgumentException: query.max-memory-per-node set to 1GB, but only 80530637B of useable heap available”?

09/16/202008/29/2021 0 Comments

Problem scenario
You try to start Apache Presto. But you get an error like this:

2020-09-29T19:28:25.866Z WARN main io.airlift.jmx.JmxAgent Cannot determine if JMX agent is already running (not an Oracle JVM?). Will try to start it manually.
2020-09-29T19:28:25.916Z INFO main io.airlift.jmx.JmxAgent JMX agent started and listening on ip-172-31-103-99.us-east-2.compute.internal:44377
2020-09-29T19:28:26.030Z ERROR Discovery-0 io.airlift.discovery.client.CachingServiceSelector Cannot connect to discovery server for refresh (collector/general): Lookup of collector failed for http://localhost:8080/v1/service/collector/general
2020-09-29T19:28:26.066Z ERROR Discovery-2 io.airlift.discovery.client.CachingServiceSelector Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://localhost:8080/v1/service/presto/general
2020-09-29T19:28:27.161Z WARN http-client-shared-29 com.facebook.presto.metadata.RemoteNodeState Error fetching node state from http://172.31.103.99:8080/v1/info/state: Server refused connection: http://172.31.103.99:8080/v1/info/state
2020-09-29T19:28:28.227Z ERROR main com.facebook.presto.server.PrestoServer Unable to create injector,

…

Continue reading “How Do You Troubleshoot the Error “java.lang.IllegalArgumentException: query.max-memory-per-node set to 1GB, but only 80530637B of useable heap available”?”

How Do You Troubleshoot Presto Errors Like “java.lang.RuntimeException: java.net.BindException: Address already in use”?

09/15/202010/11/2020 0 Comments

Problem scenario
You are getting errors when you run Apache Presto like these:

2020-09-29T19:54:59.630Z ERROR Discovery-0 io.airlift.discovery.client.CachingServiceSelector Cannot connect to discovery server for refresh (collector/general): Lookup of collector failed for http://localhost:8888/v1/service/collector/general
2020-09-29T19:54:59.828Z ERROR Discovery-0 io.airlift.discovery.client.CachingServiceSelector Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://localhost:8888/v1/service/presto/general
2020-09-29T19:55:02.734Z INFO Discovery-0 io.airlift.discovery.client.CachingServiceSelector Discovery server connect succeeded for refresh (collector/general)
2020-09-29T19:55:02.756Z INFO Discovery-1 io.airlift.discovery.client.CachingServiceSelector Discovery server connect succeeded for refresh (presto/general)
2020-09-29T19:55:05.303Z INFO main org.eclipse.jetty.server.Server jetty-9.3.9.M1
2020-09-29T19:55:05.359Z WARN main org.eclipse.jetty.server.handler.AbstractHandler No Server set for org.eclipse.jetty.server.handler.ErrorHandler@68b366e2
2020-09-29T19:55:07.050Z INFO main org.eclipse.jetty.server.handler.ContextHandler Started o.e.j.s.ServletContextHandler@3513c84c{/,null,AVAILABLE,@http}
2020-09-29T19:55:07.303Z INFO main org.eclipse.jetty.server.Server jetty-9.3.9.M1
2020-09-29T19:55:07.330Z WARN main org.eclipse.jetty.server.handler.AbstractHandler No Server set for org.eclipse.jetty.server.handler.ErrorHandler@61da0413
2020-09-29T19:55:07.688Z INFO main org.eclipse.jetty.server.handler.ContextHandler Started o.e.j.s.ServletContextHandler@336f49a1{/,null,AVAILABLE,@http}
2020-09-29T19:55:07.975Z ERROR main com.facebook.presto.server.PrestoServer Unable to create injector,

…

Continue reading “How Do You Troubleshoot Presto Errors Like “java.lang.RuntimeException: java.net.BindException: Address already in use”?”

How Do You Get Presto to Start when Errors about Presto requiring an Oracle or OpenJDK JVM?

09/02/202001/03/2021 0 Comments

Problem scenario
Presto will not run because of an error about the JVM.

You get a message like this:
“Java version has an unknown format: 15” (from “C” below)
“Presto requires an Oracle or OpenJDK JVM (found Ubuntu)” (associated with the Java version shown in B or D below)
“Presto requires an Oracle or OpenJDK JVM (found Private Build)” (associated with the Java version in E below)
“Presto requires an Oracle or OpenJDK JVM (found IcedTea)” (associated with the Java version shown in F below)

Solution
For detailed background information,

…

Continue reading “How Do You Get Presto to Start when Errors about Presto requiring an Oracle or OpenJDK JVM?”

How Do You Troubleshoot the Presto Error “‘PK\003\004’: command not found”?

08/28/202012/29/2020 0 Comments

Problem scenario
You try to start Presto from a command line interface. But you get an error message with garbled output like this:

./presto: line 1: $’PK\003\004′: command not found
./presto: line 2: $’ܢ\327H’: command not found
./presto: line 3: syntax error near unexpected token `)’
./presto: line 3: ۢ▒H▒▒W▒META-INF/MANIFEST.MF▒P▒j!▒▒▒?▒8▒▒ɸK)’

Possible solution #1
You may have downloaded a .jar file that could be unpacked.

…

Continue reading “How Do You Troubleshoot the Presto Error “‘PK\003\004’: command not found”?”