How Do You Troubleshoot a Fatal HDFS Error?

Problem scenario
You run an hdfs command and you get this:

[Fatal Error] core-site.xml:2:6: The processing instruction target matching "[xX][mM][lL]" is not allowed.
17/09/25 04:21:00 FATAL conf.Configuration: error parsing conf core-site.xml
org.xml.sax.SAXParseException; systemId: file:/home/hadoop/hadoop/etc/hadoop/core-site.xml; lineNumber: 2; columnNumber: 6; The processing instruction target matching "[xX][mM][lL]" is not allowed.
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2531)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2519)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2590)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2543)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2426)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1151)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1123)
        at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1459)
        at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:322)
        at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:488)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:378)

What is the problem?

Solution
Find your core-site.xml file.  Make sure there are no blank lines at the top. Delete any blank lines.  The first line should be "<?xml..." and not blank.

For example the top two lines may look like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

How Do You Troubleshoot Nginx Not Being Removed from Ubuntu Linux Like It Should?

Problem scenario
You want to remove Nginx from an Ubuntu server.  You run this: sudo apt-get remove nginx

But you receive this error:
"
Reading package lists... Done
Building dependency tree
Reading state information... Done
You might want to run 'apt-get -f install' to correct these:
The following packages have unmet dependencies:
 nginx-dbg : Depends: nginx (= 1.12.1-1~xenial)
E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution).
"

What do you do to remove Nginx?

Solution
Remove nginx-dbg first.  Run this command:  sudo apt-get remove nginx-dbg
Then remove nginx with a command like this:  sudo apt-get remove nginx

How Do You Eliminate Highlighting of Text in MS Word 2016?

Problem scenario
Some text in MS Word 2016 is highlighted with a color as part of its formatting.  No matter what you do (including highlighting it and clicking "No color") the text remains highlighted.   What should you do?

Solution
Highlight the text (with either the mouse or by holding shift and pressing an arrow key) then press Ctrl and the Spacebar (hold control and tap the spacebar).  This should eliminate formatting.

How Do You Get around the Message “AUTHENTICATING FOR org.freedesktop.systemd1.manage-units”?

Problem scenario
You are running a yum command or running a script on a RHEL server.  You get this prompt:

"==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: Cloud User (ec2-user)
Password:
"

What do you do to troubleshoot this prompt as no password seems to work?

Solution
Cancel out of the prompt.  Re-run the command but this time with "sudo " in front of it.  Alternatively you could run "sudo su -" to assume the root user.  But it would be best to use "sudo " in front of the command you were trying to run.

How Do You Deploy Nginx to a Docker Container on an AWS Linux Server?

Problem scenario
You installed Docker (on either Ubuntu or RedHat, see the links if you actually need help with that).  You do not want to create a Docker network on your server. How do you create a simple Docker container with Nginx without creating a user-defined network?

Solution
#1  Run these two commands:
docker pull nginx
docker run --name docker-nginx -p 80:80 nginx

# If the command hangs, use ctrl-c.  In our experience, the Docker container is successfully created despite not gracefully returning to the command prompt as you would normally expect.  Other people have experienced this problem.
# To read more about this problem of the "docker run" command hanging, you can see this link or this Docker website posting.

#2  Run two more commands:
docker ps -a

# Find the alphanumeric container ID in the results of the above command.  Then substitute abcd1234 for that container ID in the following command:

docker start abcd1234
#Run the above command but with the Docker container ID.

#3  Now assuming outbound connections from your AWS instance are not blocked over port 80 (this depends on the configuration of the Security Group in AWS that governs your Linux server), you should be able to open a web browser and type in the external IP address of the Linux server.  To find the external IP address from the back-end, run this command from the Linux server's command prompt:  curl http://icanhazip.com
The result of that command can be used from a web UI to test Nginx in a Docker container.

How Do You Troubleshoot the Docker Error “User specified IP address is supported on user defined networks only”?

Problem scenario
When using a "docker run" command you get this error: "Error response from daemon: User specified IP address is supported on user defined networks only."  What do you do to create a user-defined network?

Solution
#1  From the Linux server (the Docker host), run this command:  ip addr show

#2  Look at the results for eth0.  Find its "inet" IP address and subnet mask (e.g., 172.31.38.151/20). This is the internal IP address of the Docker server and its subnet mask in shorthand notation.  To determine the IP address of this Docker network you will create, mentally increment the last octet's integer value by 1 and keep the subnet mask for the construction of a command below.  If the octet (after incrementing it) is above 255, find an addressable IP number not being used and within the subnet mask for the server's IP address.

#3  Run a command like the following emboldened command, however you may want to substitute "isolated_nw1" with the name of your choice for this user-defined network.  You must substitute x.x.x.x in the draft of the command below with the IP address this network will have (possibly as simple as one greater than the last octet of the server's internal IP address).  You will likely substitute 20 with the subnet mask short hand notation that the server's internal IP address has.

docker network create --driver bridge isolated_nw1 --subnet x.x.x.x/20

The result may look like this:

docker network create --driver bridge isolated_nw1 --subnet 172.31.38.152/20

Take a note of this IP address and mentally increment the last octet one more time.  You'll use that [incremented] IP address later for a Docker container.  But that should fix the problem of not having a Docker user-defined network.

If you are interested in networking containers, you may want to read a short posting about Flannel or Calico.

How Do You Choose the Right Spell When Using Conjure-Up and The “Enter” Key Does Not Work?

Problem scenario
The conjure-up menu is not working.  You run a command like "conjure-up kubernetes" and get to a prompt/screen with an orange banner at the top.  The "Tab" key works.  But "Enter" only works on the "Quit" button.  There is no error message.  How do you proceed with this conjure-up menu prompt  (e.g., to deploy Kubernetes) when the only option that works is "Quit"?

Solution
Expand your Putty session so the GUI is bigger on your screen (wider and longer).  If you cannot see all the options in the conjure-up menu, the menu will throw no errors, but it will not work correctly either.  The "Enter" key will have no effect unless you highlight "Quit."  Once expanded, the "Enter" key will work and allow you to proceed.

How Do You Generate a Load for Your Nginx or Apache Web Server?

Problem scenario
You set up a load balancing mechanism for your Nginx or Apache web server.  You want to test it and set up an artificial load of traffic.  You want to generate a significant amount of traffic to test the HTTP load balancing mechanism.  How do you do this?

Solution
Here are three scripts that can, by themselves, download a web page 100 times when they execute.  The languages are Bash, Python (but the OS must be Linux), and PowerShell.  So if you have a Windows workstation and Linux computer, you should be able to get some traffic for your web page.  You may want to modify the web page to be larger (with more text).  You may want to modify the scripts below to change the "100" to "1000" for a longer-running script.  The load that these scripts will place will be a longer duration if you change the "100" in them to "1000".  A fourth suggestion would be to use Gatling.

Here is a bash script that will download a web page 100 times (just replace 10.10.0.1 with the URL you want to download):

#!/bin/bash 
function callit {
   curl 'http://10.10.0.1'
}
i="0" 
while [ $i -lt 100 ]
do
  callit
  i=$[$i+1]
done
echo "it is done"

Here is a Python script that will work on a Linux server; it will download a web page 100 times (just replace 10.10.0.1 with the URL you want to download):

import os
i=0
while (i < 100):
  os.system("curl http://10.10.0.1 >> delete")
  i = i + 1

Here is a PowerShell script to download the web page (you can replace this with the URL http://10.10.0.1 to your own website) 100 times:

function dwnld-webpage {
   $website = Invoke-WebRequest -URI "http://10.10.0.1"
   echo $website.content >> delete.txt
}
$i=0
while ($i -lt 100) {
  dwnld-webpage
  $i++
} 

You may want to try Gatling too.

How Do You Install the Zabbix Client on an Ubuntu AWS Instance?

Problem scenario
You are running Ubuntu Linux in AWS.  You want to monitor it with your Zabbix server.  (To set up a Zabbix server in AWS, see this posting.)  How do you install the Zabbix client on an Ubuntu server?

Solution
Section 1

Make sure your AWS Security group allows for inbound connections from the internal IP address of the Zabbix server on any TCP port.  Otherwise the client will not be able to be monitored.

Section 2
Run this command:  sudo apt-get -y install zabbix-agent

Section 3
Go to the back end of the Zabbix server. Find the internal IP address of the Zabbix server's back end, run this command and find a different IP address from 127.0.0.1:  ip addr show | grep inet

Then run "hostname -f".  Keep this IP address and FQDN for the next steps performed on the other server.

Section 4
Return to the Linux server that will receive the agent.

sudo vi /etc/zabbix/zabbix_agentd.conf

Edit line 85 and change the "127.0.0.1" to the internal IP address of the Zabbix server.    It will look like this:
Server=10.1.1.1

Edit line 126 so "ServerActive=" is assigned the internal IP address of the Zabbix server.  It will look like this:
ServerActive=10.1.1.1

Edit line 137 so that "Hostname=" is assigned the internal FQDN of the Zabbix server.  It will look like this:
Hostname=fqdn.zabbix.com

Save the changes of the file.  From the command prompt run these commands:

sudo systemctl start zabbix-agent
sudo systemctl enable zabbix-agent

Section 5
Go to the web UI of the Zabbix server.  Go to Configuration -> Hosts -> Create Host.

In hostname, enter the internal FQDN of the client server.  In the Groups section, highlight "Linux servers" and "Virtual Machines."  Click the "left arrow" button to move these two groups to the "In Groups" section.  In the "IP Address" of the "Agent interfaces" section, enter the IP address.

Click "Add" in the lower left-center section.

Click the hostname that is hyperlinked.

Click the "Templates" tab (but not the highest of the "Templates" tab) that is below other tabs.  Here is a picture of where to click:

The resulting screen should look like this:

In the "Link new templates" field, type "Template OS Linux."  Click the correct suggestion as it appears.   Then type in "Template App SSH Service."  Click the correct suggestion as it appears.  Click "Add."  Then click "Update."

You are done.  Be prepared to see this alert "Lack of free swap space on ..."   These directions are just designed to get you started.

How Can You Learn about Kubernetes in One Hour?

Problem scenario
You need to learn more about Kubernetes but do not know where to start.  You have roughly one hour to try to know much more about Kubernetes.  What should you do?

Solution
For an introduction to Kubernetes, see these two articles from Linux.com and TechTarget.

This DigitalOcean article is a little out of date (as of 9/18/17), but it still has some information to glean from.

If you want to know why some people think Kubernetes is the hottest technology out there right now (as of September 2017), see this webpage.

For an overview of different concepts and keywords in Kubernetes, see either this link or this other one.

(A now broken link was available to learn about the Kubernetes API in a technical way to actually use it: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md
We know this link no longer works.)

Below are links to news about Kubernetes.  If you want to buy a book on Kubernetes, see this posting.

Kubernetes – Production-Grade Container Orchestration