CONTINUAL INTEGRATION – Page 202 – A Technical I.T./DevOps Blog

To Use Chef’s Basic Features, How Do You Register a Chef Client with Your Chef Server?

Problem scenario
You have installed Chef server on a RedHat Enterprise Linux (RHEL) server in AWS. You have installed Chef client on another RHEL instance in AWS. You simply want your Chef client to receive configuration management changes (e.g., you want Chef recipes to work). The command "chef node list" on your Chef client server returns no servers. You ran this command from the Chef client:

sudo chef-client -S https://<FQDN of Chef server>/organization/contint

But you received an error like this:

Chef encountered an error attempting to create the client "<FQDN of Chef Client computer>"
================================================================================

System Info:
------------
chef_version=13.4.19
ruby=ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]
program_name=chef-client worker: ppid=5156;start=14:10:31;
executable=/opt/chef/bin/chef-client

[2017-09-13T14:10:31+00:00] WARN: *****************************************
[2017-09-13T14:10:31+00:00] WARN: Did not find config file: /etc/chef/client.rb, using command line options.
[2017-09-13T14:10:31+00:00] WARN: *****************************************
Starting Chef Client, version 13.4.19
Creating a new client identity for ip-172-31-10-187.us-west-1.compute.internal using the validator key.
[2017-09-13T14:10:32+00:00] ERROR: SSL Validation failure connecting to host: <FQDN of Chef server> - SSL_connect returned=1 errno=0 state=error: certificate verify failed
[2017-09-13T14:10:32+00:00] ERROR: SSL Validation failure connecting to host: <FQDN of Chef server> - SSL_connect returned=1 errno=0 state=error: certificate verify failed

Running handlers:
[2017-09-13T14:10:32+00:00] ERROR: Running exception handlers
[2017-09-13T14:10:32+00:00] ERROR: Running exception handlers
Running handlers complete
[2017-09-13T14:10:32+00:00] ERROR: Exception handlers complete
[2017-09-13T14:10:32+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 01 seconds
[2017-09-13T14:10:32+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out

You examined /var/chef/cache/chef-stacktrace.out. It shows

"OpenSSL::SSL::SSLError: SSL Error connecting to https://<FQDN of Chef server>/organization/contint/clients - SSL_connect returned=1 errno=0 state=error: certificate verify failed
...
>>>> Caused by OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: certificate verify failed"

On the Chef server you could not find a "trusted_certs" file or folder besides these:

"/opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/berkshelf-5.6.5/spec/data/trusted_certs
/opt/opscode/embedded/lib/ruby/gems/2.2.0/gems/chef-12.19.36/spec/data/trusted_certs
/opt/opscode/embedded/service/gem/ruby/2.2.0/gems/chef-12.19.36/spec/data/trusted_certs
/opt/chef-manage/embedded/lib/ruby/gems/2.3.0/gems/berkshelf-6.1.0/spec/data/trusted_certs
/opt/chef-manage/embedded/lib/ruby/gems/2.3.0/gems/chef-12.19.36/spec/data/trusted_certs
/opt/chef-manage/embedded/service/gem/ruby/2.3.0/gems/chef-11.16.2/spec/data/trusted_certs
/opt/opscode-push-jobs-server/embedded/lib/ruby/gems/2.2.0/gems/chef-12.13.37/spec/data/trusted_certs
/opt/opscode-push-jobs-server/embedded/service/gem/ruby/2.2.0/gems/chef-12.12.15/spec/data/trusted_certs
/opt/chefdk/embedded/lib/ruby/gems/2.4.0/gems/berkshelf-6.3.1/spec/data/trusted_certs
/opt/chefdk/embedded/lib/ruby/gems/2.4.0/gems/chef-13.3.42/spec/data/trusted_certs"

On the chef server a run of "knife ssl fetch" does not fix the problem. How do you solve this problem so your Chef client can register with the Chef server and eventually work?

Solution
The command "chef node list" may not work if ChefDK is installed. Use "knife node list" instead (from a workstation with knife or the Chef sever itself). To see if ChefDK is installed, run "chef -v". If ChefDK is not installed on the Chef server, keep reading.

On the Chef client, create a client.rb file in /etc/chef/ directory. Here is a minimal example (with just three lines) that you could modify and adapt to your own needs:

chef_server_url 'https://<FQDN of Chef server>/organizations/contint'
validation_client_name 'contint-validator'
ssl_verify_mode :verify_none

Replace <FQDN of Chef Server> with the FQDN of the Chef server. Replace "contint" with your organization name (e.g., the company nickname). Now this command from your Chef client should work:

sudo chef-client -S https://<FQDN of Chef server>/organization/contint

Now "chef node list" should work from Chef server.

How Do You Find What Resource Groups You Have with Azure PowerShell?

Problem scenario
You want to find the resource groups in your Azure account. Either you have already installed Azure PowerShell on Windows 7 or on Windows 10 you have installed the Azure modules. You have connected to an Azure account (e.g., with this command Login-AzureRMAccount). What is the PowerShell command to list the resource groups in the Azure account?

Solution
Run this command:
Find-AzureRmResourceGroup

# It will work with the Azure Cloud Shell too.

How Do You Know If Apache Parquet Is Installed?

Problem scenario
You are not sure if Apache Parquet has been installed on your Linux server.

Solution
Run this command:

parquet --help

# This assumes you have logged out and logged back in after installing it. It assumes that the parquet exe is in the PATH environment variable too.

How Do You Write a Python Program That Can Run Linux Bash Commands?

Problem scenario
You want to use Python to run Linux commands. You have been told to not use the "import os" for this task. You want to manipulate the text and output of Bash commands for sophisticated processing and automation with Python. How can a Python program run Bash commands?

Solution
Use "from subprocess import check_output" as the first line. Then encapsulate the Bash command inside double quotes, square brackets and parentheses like these two lines of Python code:

out = check_output(["date"])
sn = check_output(["hostname"])

Later in your Python program, you will be able to use "out" and "sn" as the date and servername respectively. Here is an example (test.py) of how to use the "date" and "hostname" Bash commands inside a Python script:

from subprocess import check_output
out = check_output(["date"])
sn = check_output(["hostname"])
for i in sn:
print(i)
print(sn)
print(out)

If you are not constructing the Bash commands from an untrusted source, or you have a sanitation system for such input, you can use "shell=True". This will enable you to invoke a wider range of Bash commands (including those with pipes "|") with greater complexity. Here is a Python 2 example of how to use "shell=True":

from subprocess import check_output
out = check_output(["cat /etc/*-release | grep NAME | grep PRETTY"], shell=True)
sn = check_output(["hostname"])
print out

The above in Python 3 is here:

from subprocess import check_output
out = check_output(["cat /etc/*-release | grep NAME | grep PRETTY"], shell=True).strip()
out = str(out)
out = out[1::]
sn = check_output(["hostname"])
print(out)

How Do You Troubleshoot the Problem “ImportError: Entry Point (‘console_scripts’, ‘Parquet’) Not Found”?

Problem scenario
You are trying to run Apache Parquet commands. But each command gives this error:

Traceback (most recent call last):
File "/usr/local/bin/parquet", line 11, in <module>
    load_entry_point('parquet==1.2', 'console_scripts', 'parquet')()
File "/home/ubuntu/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 570, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2750, in load_entry_point
    raise ImportError("Entry point %r not found" % ((group, name),))
ImportError: Entry point ('console_scripts', 'parquet') not found

How do you fix this problem?

Solution
1. Remove pip (e.g., with Ubuntu run "sudo apt-get remove python-pip")

2. Reinstall pip (e.g., with Ubuntu run "sudo apt-get install python-pip")

3. Upgrade pip (e.g., with Ubuntu run "sudo easy_install --upgrade pip")

How Do You Learn More about Quality Assurance Automation Tools?

Problem scenario
You want to learn more about QA automation tools. What do you do?

Solution
The DevOps Zone web page has an article entitled "Top 10 Testing Automation Tools for Software Testing." For people who are unfamiliar with QA-related technologies, they may not recognize the names of the companies behind the tools except HP. Many companies rely on HP's QA tools despite the fact that HP may sell its software development tools business unit to a different company. To learn more about QA tools themselves you may be interested in purchasing a book or two. You may also want to go Selenium's website. Another good article is this TechTarget one.

How Do You Deploy Apache Mesos and Apache Marathon to an Ubuntu Linux Server in AWS?

Problem scenario
You want to install Apache Mesos and Apache Marathon to an Ubuntu 16.x Linux server in AWS. How do you do this?

Prerequisites
You need two Linux instances with relevant Security Group rules added to allow for connectivity between the two. One Ubuntu Linux server will be for the Mesos master and the other server will be for the Mesos slave. You also need to be able to use a web browser to the Mesos master server. You can use Apache Mesos and Marathon with other distributions of Linux. This solution happens to use Ubuntu.

Solution
Steps 1 through 8 will all be done on the server that will become the Apache Mesos master server.
1. On the server that will be the Apache Mesos master server, run these commands:

sudo apt-get -y update sudo apt-get -y install zookeeperd

2. Run this command:
sudo service zookeeper status
# Enter these characters without the quote marks ":q" to escape

3. Run this command:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv DF7D54CBE56151BF

4.a. Create this file /etc/apt/sources.list.d/mesosphere.list
4.b. This file above should have this line:
deb http://repos.mesosphere.com/ubuntu xenial main

5. Run these commands:
sudo apt-get -y update sudo apt-get -y install mesos

6. Create this file /etc/mesos-master/advertise_ip
It should have the external IP address of the server:
x.x.x.x

To find what x.x.x.x should be if you do not know it, run "curl http://icanhazip.com" with no quotes from the master.

7. Create this file /etc/mesos-slave/advertise_ip
It should have the external IP address of the server:
y.y.y.y

To find what y.y.y.y should be if you do not know it, run "curl http://icanhazip.com" with not quotes from the slave.

8. Run this command:
sudo service mesos-master start

Steps 9 through 16 should be done on the server that will be the Apache Mesos slave server.

9. Go to the server that will be the Mesos slave. Run these commands:

sudo apt-get -y update sudo apt-get -y install zookeeperd

10. Now run this command:
sudo service zookeeper status
# Enter these characters without the quote marks ":q" to escape

11. Run this command:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv DF7D54CBE56151BF

12.a. Create this file /etc/apt/sources.list.d/mesosphere.list
12.b. The file above should have this line:
deb http://repos.mesosphere.com/ubuntu xenial main

13. Now run these commands:
sudo apt-get -y update sudo apt-get -y install mesos

14. Create this file /etc/mesos-master/advertise_ip
It should have the external IP address of the server:
x.x.x.x

To find what x.x.x.x should be if you do not know it, run "curl http://icanhazip.com" with not quotes from the master.

15. Create this file /etc/mesos-slave/advertise_ip
It should have the external IP address of the server:
y.y.y.y

To find what y.y.y.y should be if you do not know it, run "curl http://icanhazip.com" with not quotes from the slave.

16. Now run this command:
sudo service mesos-slave start

17. Go back to the Mesos master server. Run these commands:
sudo apt-get -y install marathon sudo service marathon start

You are done.

18. Optional step: Now you can open a web browser knowing what the external IP address of the Mesos master server is. To see the Mesos web UI, go here (where x.x.x.x is the external IP address of the Mesos master): x.x.x.x:5050
To see the Marathon web UI, go here (where x.x.x.x is the external IP address of the Mesos master): x.x.x.x:8080

How Do You Use PowerShell on Your Desktop to Manage Azure?

Problem scenario
You want to use PowerShell on your local machine to create, delete, and restart servers in Azure. You want to do other things with Azure using PowerShell on your workstation. What do you do?

Solution
1. To run Azure commands from your desktop, you need to connect your PowerShell ISE with your Azure instance. If you are using Windows 10, see this posting and ignore this set of directions here entitled "How Do You Use PowerShell on Your Desktop to Manage Azure?" If you are using Windows 7, proceed to step #2 and remember that you will need to reboot your workstation.

2. Install the Azure tools; you can download them here.

2. When the installer above is open click "Add" next to "Microsoft Azure PowerShell."

3. Click "Install."

4. You can read the license terms. It appears that the licenses are the Apache License and MIT License. If you agree click "I accept."

5. Reboot your computer.

6. Open PowerShell. Run this command: Get-AzurePublishSettingsFile

# Be prepared for a web browser to open up with the Azure Portal website.

7. Log into the Azure Portal via the web browser as you normally would.

8. You'll be prompted to open or save a file. Save it wherever you like on your desktop.

9. Draft a command like this:

Import-AzurePublishSettingsFile -PublishSettingsFile "C:\Users\jdoe\downloads\mysettings.publishsettings"

# where C:\users\jdoe\downloads\ is the path to the file you just downloaded.

Run this command once you've changed the location.

10. Optional step. Run this command to verify it worked:

Get-ChildItem -Recurse cert:\ | Where-Object {$_.Issuer -like '*Azure*'} | select FriendlyName, Subject

# Above command was taken from this article.

11. Now you can run Azure commands. Optionally you can try this one: Get-AzureLocation

How Do You Troubleshoot a Problem with Adding a DataNode to a Hadoop Cluster?

Problem scenario
You are trying to add a DataNode to an existing Hadoop cluster. There are numerous problems. What do you do to troubleshoot the process?

Possible solutions
1. New versions of Hadoop use a "workers" file -- not a "slaves" file.

2. Do you have a DNS solution in place for your DataNode and NameNode to resolve each other? If you do not have a DNS server, does the /etc/hosts file of the DataNode server have an entry for the NameNode? Can the NameNode resolve the domain name of the DataNode servers individually via the NameNode itself?

3. Can the "hduser" (or whichever user starts the Hadoop cluster) passwordlessly SSH into the DataNode from the NameNode? This is generally necessary for a multi-node cluster. If you need directions for setting up passwordless SSH, see this article.

4. Warning: This will delete all the data from your cluster. If you are having a problem adding a DataNode to a cluster, you may want to try deleting all directories and files in /app/hadoop/tmp/ on the individual DataNodes that are not being added when you run start-dfs.sh on the NameNode. After you delete the data in /app/hadoop/tmp/ on the DataNodes, you may want to run hdfs namenode -format on the NameNode. This will delete all the data in your Hadoop cluster; however it can help with troubleshooting. You can then run the start-dfs.sh script to see if the DataNodes will be added to the cluster.

5. When you run the jps command on the NameNode and DataNode, do you see a Hadoop node component running? You may need to shut down all node services first and restart them. To install jps, see this article.

6. Check the /usr/local/hadoop/etc/hadoop/core-site.xml files on the DataNodes. Do they have "localhost" or do they have the hostname of the NameNode? They need the hostname of the NameNode. If you forgot to modify these files, the respective DataNode will not join the cluster as normal. The "jps" command will still show the "DataNode" service starting and stopping as you control it via the NameNode. But the DataNode will not make its storage capacity available.

7. If the DataNode service starts and stops with the NameNode's run of start-dfs.sh and stop-dfs.sh but you are not seeing the DataNode in the cluster with an "hdfs dfsadmin -report" command, the problem could be that there is a firewall rule protecting the NameNode that blocks port 54310.

You may want to use nmap on the DataNode to determine if the port defined in /usr/local/hadoop/etc/hadoop/core-site.xml (the path may be different) is not blocked. If this port is blocked you can still start and stop the DataNode service from the NameNode, but there will be a problem with the DataNode actually working in the cluster.

8. Are you using the start-all.sh script on the NameNode? Some Hadoop administrators use combinations of start-dfs.sh on both the NameNode and the DataNode. It may be easier if you use the start-all.sh script from the NameNode. The start-all.sh script is not advisable in production environments. This is just for troubleshooting.

9. Search for a dfs.hosts.exclude file. Is the server that will not join the cluster in there?

10. Do you have a dfs.include or dfs.hosts file? Could you try adding the server to one of these files and restarting the services on the name node?

11. Warning: This will delete all the data from your cluster. You may want to read these step-by-step directions to start over with your deployment (i.e., you may want to reinstall and reconfigure from the beginning).

12. Optional reading on Apache's website.

How Do You Create a Chef Automate Server in Azure?

Problem scenario
You want to have a Chef Automate server in Azure. How do you create one?

Solution
1. In the Azure portal click the "New" button on the left.
2. Search for "Chef Automate" with no quotes.
3. Click "Create".
4. Fill out the required configuration settings. Make a mental note of the Chef Automate FQDN DNS Label that you provide (e.g., contint.eastus.cloudapp.azure.com). For the selection of a Chef Automate license file, you can use none and be provisioned with a 30 day trial license.
5. If you can agree to the Terms of Use and various conditions, click "Purchase."
6. While the process is completing, install on your Windows, Mac, or Linux workstation the Chef Development Kit. To learn how, use this link, but you do not need to do the section on "Setting up the Chef repo."
7. When the "Chef Automate" server has been provisioned, compose a URL like this (but do not use the quotes) but replace "contint.eastus.cloudapp.azure.com" with the FQDN you chose in the process above:

"https://" + "contint.eastus.cloudapp.azure.com" + "/biscotti/setup"

8. Go to the URL (the one you composed in the previous step) via a web browser. You may want to confirm the security exception to proceed.
9. In the web UI, enter the info as it is required. If you can agree to the EULA and Master Agreement, click "Setup Chef Automate & Download Starter Kit."
10. Save the .zip file when prompted.
11. Click the web UI button for "Login to Chef Automate". Your are now done.