How Do You Troubleshoot a Python MapReduce Job That Returns “Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1”?

Problem scenario
Your Python MR (mapreduce) job is failing. You do not know why.

You may or may not get an error like this: "Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1"

Other symptoms you may experience is the MapReduce job takes a great deal of time. It seems to hang. What should you do?

Solution

  1. Determine which map .py file you are using and which reduce .py file that you are using. For this example, let's assume you are using /tmp/mapper.py and /tmp/reducer.py. Find the input file (or one input file). Let's say you are using /tmp/foobar.txt.
  2. Run a command like this: cat /tmp/foobar.txt | python3 /tmp/mapper.py

3. Did it fail with an error? Or did it run error free? If there was an error, change the syntax of mapper.py because the problem must be there. If there was no error, run a command like this:

cat /tmp/foobar.txt | python3 /tmp/mapper.py | python3 /tmp/reducer.py

If there was an error in the above command, change the syntax in reducer.py. If there was no error to the above command, use the MapReduce job on one input file at a time. Your mapper.py and reducer.py files seem good. Is the command you are using pointing to different map and reduce Python programs compared to what you think is being used?

Leave a comment

Your email address will not be published. Required fields are marked *