How Do You Troubleshoot the Hadoop Message “Exception in thread “main” java.nio.file.AccessDeniedException: /home/jdoe/./mapper.py”?

Problem scenario
You are trying to run a hadoop command (to kick off a mapreduce job). But you get this error:
“Exception in thread “main” java.nio.file.AccessDeniedException: /home/jdoe/./mapper.py”

What should you do?

Solution (short version)
Change to a directory where the user can write files to. Retry the command.

Solution (long version)
Create a directory that is owned by the user and the group associated with the user that is running this command.

How Do You Troubleshoot the Ansible Error ‘”hadoop_env” is undefined’?

Problem scenario
You try to run a playbook. But you get a message like this: ‘fatal: … “AnsibleUndefinedVariable” ‘hadoop_env’ is undefined’. What should you do?

Root cause: a variable you defined (e.g., in a playbook or role) is not getting assigned when you run the playbook.

Possible solution #1
Find the vars directory or create it.

How Do You Install Apache Flink on Any Type of Linux?

Problem scenario
You want a script to install Apache Flink on any type of Linux (including CentOS/RHEL/Fedora, Debian/Ubuntu and SUSE). What should you do?

Solution
Prerequisites
When using Ubuntu/Debian or SUSE servers, 1 GB of RAM is sufficient. For CentOS/RHEL/Fedora, we recommend either more RAM or use virtual memory. To find a script that will create swap space,

How Do You Install Apache Presto on Any Type of Linux?

Problem scenario
You want to install Apache Presto to try it out. How do you do this with a script that will work on Debian/Ubuntu, CentOS/RHEL/Fedora and/or SUSE?

Solution
Prerequisites
i. You have Python installed as “python” (not just python3). If you need assistance installing Python, see this posting. Verify that python –version works.

How Do You Troubleshoot the Message “ERROR: but there is no YARN_RESOURCEMANAGER_USER defined.”?

Problem scenario
You run sudo bash start-yarn.sh but you receive this message:

ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.

What should you do?

Solution
1. Modify start-yarn.sh. Underneath the last section of comments, place three lines with the following text:

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

2. Modify stop-yarn.sh.

How Do You Solve This Problem “Error: Could not find or load main class org.apache.hadoop.util.VersionInfo”

Problem scenario
You run “hadoop version” but you receive this message “Error: Could not find or load main class org.apache.hadoop.util.VersionInfo”. What do you do?

Possible Solution #1
Use “sudo ” before the “hadoop version” command.

Possible Solution #2
Use “sudo -i ” before the “hadoop version” command.

Possible Solution #3
Use a different user.

How Do You Get Hadoop Commands to Work from Any Directory without Using the Full Path?

Problem scenario
Hadoop is installed on Linux. But hadoop version and other hadoop commands are not working. What should you do?

Solution
Find the hadoop executable in a directory named bin. It is often “/usr/local/hadoop/bin/hadoop”. Ultimately you need to find the directory that houses this “bin.” has a subdirectory with “bin” and “hadoop” inside, run these two commands:

sudo find / -name hadoop -type f
whereis hadoop

Run these commands interactively where “/usr/local/hadoop” is the directory that is the parent of the subdirectory named “bin” that is the parent of the hadoop executable.

How Do You Know if Hadoop is Installed (and the version if it is installed) on Linux SUSE?

Problem scenario
You are administering Linux SUSE machines. You want to see if Hadoop is installed on them. The command hadoop version does not work.

Solution
Run this command:

sudo find / -name hadoop -type f

From the results above, you can probably find the file and path of the executable. It will likely not be in /var/ or /tmp/.