How Do You Copy Files into a Docker Container from the Server’s Command Line?

Docker is itself a dependency resolution tool.  It is a container that allows a DevOps engineer to prepare one-time an OS environment with nuanced dependencies and configurations for other packages to be installed.

Leveraging the efficiency of a configuration management tool (such as Ansible, CFEngine, Chef, Puppet, and SaltStack) can empower DevOps engineering.  It can also necessitate using duplicative deployments in different environments (development, quality assurance, staging, and production).  Having a backup plan for disaster recovery is also important.  The Docker container may not have everything that the host OS has.  So some dependencies for the CM tool may need to be installed for the CM tool to work.  You may need to insert in /usr/bin/ two important files: ssh and sftp.  Ansible requires both of these binaries.  The host OS may be an acceptable source for such files.

To copy these files into Docker, do the following:

1)  Use this command to find the container ID: docker ps
2)  If the container is not running use two commands:

docker ps -a
#then use
docker start <containerID>

3) docker cp /usr/bin/ssh <containerID>:/usr/bin/ssh
4) docker cp /usr/bin/sftp <containerID>:/usr/bin/sftp
5) Enter the Docker container with this:
docker exec -it <containerID> bash
6) Issue these commands (while inside the Docker container):
chmod 755 /usr/bin/ssh
chmod 755 /usr/bin/sftp

Now these files will exist and be executable. 

Many CM-related tools require Ruby.  To install Ruby from source, you need to have make installed in the container.  Rather than try to install it, assuming the Docker host is the same Linux distribution as the kernel, use this command:  docker cp /usr/bin/make <containerID>:/usr/bin/make

(If you need directions for installing Docker on any type of Linux in any public cloud, view this posting; it should probably be enough to help you.)

OpenStack Wikipedia Article: Sahara Paragraph Updated

I edited Wikipedia's OpenStack Article found here.  This is the paragraph for Sahara as I found it on 4/5/16:
"Sahara aims to provide users with simple means to provision Hadoop clusters by specifying several parameters like Hadoop version, cluster topology, nodes hardware details and a few more. After a user fills all the parameters, Sahara deploys the cluster in a few minutes. Sahara also provides means to scale an already-provisioned cluster by adding and removing worker nodes on demand."
This is what I revised it to be:
"Sahara is a component to easily and rapidly provision Hadoop clusters. Users will specify several parameters like the Hadoop version number, the cluster topology type, node flavor details (defining disk space, CPU and RAM settings), and others. After a user provides all of the parameters, Sahara deploys the cluster in a few minutes. Sahara also provides means to scale a preexisting Hadoop cluster by adding and removing worker nodes on demand."

How to Handle “Failed to connect to the Docker daemon” message in Linux

To see if Docker has started, do this command:
ps -ef | grep -i docker
If that returns only a service for the grep itself, then Docker is not running.   Occasionally the Docker service won't start through traditional methods.  But some users have found that this command will work reliably:
docker daemon &
The "&" allows for the next prompt to return.  This method is explicit to new users of Docker too.  This method provides more verbose informational messages to print to the console as compared to "systemctl docker start."

How do you install two or more RPM packages when they depend on each other?

Question:  How do you solve circular dependency problems when installing RPMs in RedHat Linux?
Problem Scenario:  For example, you keep trying to install different RPMs, but they always require a different installation.  By exhaustively going through the dependencies, you find a circle of dependencies.  This is sometimes called mutual recursion.

Root cause:  Human error.

Solution:  The way to resolve circular dependencies is with a yum localinstall command with a list of each of the RPM packages afterward. For example if packageA.rpm depends on packageB.rpm to be installled, and packageB.rpm depends on packageC.rpm to be installed, and finally packageC.rpm depends on packageA.rpm to be installed, what do you do?  Put packageA.rpm, packageB.rpm, packageC.rpm in the local directory.  Then do this:
yum localinstall packageA.rpm packageB.rpm packageC.rpm

For people new to patching RedHat derivatives, learning to apply different patches simultaneously solves the circularly dependent problem.  If there is an error message still, and it seems impossible to solve, look closely at the error message.  The error message may have a subversion in the requires message (e.g., a subtle .5 after a number) that is slightly higher than one of the versions that you are trying to install. Certain combinations of .rpm files can be finicky or particular with each combination of .rpm versioned files.  The solution is possible however.  Once you have the correct versions of potentially long-named .rpm files, do the following:

Step #1:  Go to the directory where your .rpm files are.

Step #2:  Issue on of the following:

sudo yum localinstall *.rpm
sudo rpm -ivh *rpm

Any persistent error message may be telling you something.  There may be a version incompatibility.

Possible Problems With Rendering PHP with Apache

Problem scenario
Your PHP code is being displayed in a raw fashion.  It is not being rendered or presented nicely as it should.  You see raw PHP text when you open a web browser and go to the .php page.  What should you do?

Solution
If the PHP page is blank (or completely white), see this posting.

Is your PHP program using Linux Bash commands? If so, see this posting.

If a PHP page is retrievable from a web browser, but shows only text and the actual source code in PHP, Apache is at least somewhat working.  What is the root cause?  One potential root cause is that PHP has not been installed.  If you are running Ubuntu Linux, run this command to remedy two potential root causes:  sudo apt-get -y install php7.0 libapache2-mod-php

Another root cause of this problem may be that Apache web server has not been configured correctly.  Find the httpd.conf file (e.g., if you are not running Ubuntu, because with Ubuntu there will be an apache2.conf file instead).  You may want to consult with this external posting.  Otherwise find the "AddType" section of the httpd.conf file.  Ensure that this line is present:  AddType application/x-httpd-php .php

The second stanza to look for is the LoadModule one.  In the LoadModule section of the httpd.conf file, ensure that a line like this is present (assuming you are using PHP5):
LoadModule php5_module modules/libphp5.so

You may need to replace "modules" above with the absolute path to the libphp5.so file.  The libphp5.so file may not be present after a Drupal installation and configuration.  Proper Drupal* installations use the ./configure script with a flag like this: --with-apxs2=<pathToAPXS>
The value <pathToAPXS> should be the results of this command: which apxs
If the results of this which apxs command are nothing, try one of these commands:
    -If you are using a RedHat derivative OS): # echo "extension=apc.so" > /etc/php.d/apc.ini
    -If you are using a Debian distribution: # echo "extension=apc.so" > /etc/php5/conf.d/apc.ini

The flag would be --with-apxs (with no "2") if the Apache version is older. 

*Drupal is a content management system. To find out how it is pronounced, go here.

How To Potentially Solve an HTTP 403 Error on An Apache Server

Problem scenario:  You are trying to access a file on a website.  But you get the 403 Forbidden error every time.  What are some different things to look for to fix this problem?

Solution:

If you do not have access to the back-end of the web server, try these:

  1. Clear the cache/history from your web browser.
  2. Clear the cookies from your web browser.
  3. If you are using wget, try this:  wget -U firefox http://continualintegration.com/     (where continualintegration.com is the URL of the website).
  4. Verify the URL is correct. Have you entered it with correct case sensitivity? Some URLs can be case sensitive.

If you have access to the web server, try these steps: 

  1. Verify that read permissions are given to the user trying to access the file.  For example, -r--r--r-- would be the minimum permissions needed.  You could use this command: "chmod 644 file.txt" (with no quotes where file.txt is the file you are tryint to retrieve).  If the parent directories of the file on the Apache server have strict permissions (at the regular, OS user level on the back end of the server, users on the web front end may encounter the 403 "Forbidden" message in their web browsers or from a wget command.
  2. Another cause of this problem is the <Directory> </Directory> section of the httpd.conf file could be configured with a "Require all denied" stanza.  If this line appears, the DocumentRoot directory will be locked down.  That is, no web page will be visible beyond the default "Testing 123" Apache page.  To open up the DocumentRoot directory tree, change the "Require all denied" to "Require all granted".  This way the Apache server will present a web page to requesters without a "Forbidden" or 403 error.  By default, if you install Apache web server and change only the DocumentRoot directory from something besides the default /var/www/html, other references to this very same directory will not be changed.  Therefore you need to change those references in httpd.conf.  To find httpd.conf, try this Linux command "find / -name httpd.conf" (with no quotes).
  3. The problem could be intermittent. Read this for more information.

OpenStack Sahara Documentation

Some open source projects don't always listen to contributors' feedback.  We reported a couple errors that we found in OpenStack documentation to openstack.org.  Here are the errors we saw (as of 2/2/17):

#1  If you go to this link, you'll find two "Storm EDP" links:
http://specs.openstack.org/openstack/sahara-specs/

One points to this link:  http://specs.openstack.org/openstack/sahara-specs/specs/liberty/storm-scaling.html

We see no reason why the title/header of this above page is "Storm EDP" and not "Storm Scaling."  My attempt at a contribution was to not have two "Storm EDP" links in the first link of this post. 

#2  We found this ungrammatical sentence here (which needs the word "needs" instead of "need"):
"Sahara need more flexible way to work with security groups."  This was taken from: http://specs.openstack.org/openstack/sahara-specs/specs/juno/cluster-secgroups.html

The OpenStack Foundation probably has limited resources.  But if they had a way to listen to each person's contributions, progress would be more rapid.  The more requests are ignored, the less likely contributions will be made.

SaltStack Technology and Terminology

SaltStack provides for more complex configuration management than Ansible (another Python-based) configuration management tool.  Some people have criticized Salt for having too many new vocabulary words.  Like all complex technologies, they take time getting used to.  To help learn about Salt, I thought I'd provide an overview.

An SLS file is a SaltStack State file.  This file is the basis for determining the desired configuration of the client servers that are called Salt Minions.  A pre-written State file is called a formula in the world of SaltStack.  Just like sodium and chloride can be the basis of other compounds, formulas can be the basis of complex desired state configurations.  Grains, in SaltStack terminology, are data about a Salt Minion. A grain may include information such as an OS type of a Minion server.  The data is generated from the minion.  Pillars are data about Salt Minion servers too.  But pillars are are stored on the Salt Master server.  Pillars are encrypted, and they are ideal for storing sensitive data that should only go to certain Salt Minion servers.  Pillar sls files have data like a state tree (a collection of sls files) except that pillar data is only available for servers that have a given "matcher" type.  

Beacons, in the context of SaltStack, are constant listeners for a condition to be met.  If used properly the beacon can have a corresponding action to be taken from the "reactor system."  A reactor sls file will have a condition and trigger an action because of the beacon listener. 

The first two paragraphs were a combination of original content and content paraphrased from these two links:  Pillar and Highstate.  The final paragraph was paraphrased from this link on Reactors.

Containerization Has Its Advantages Over Virtualization

Containers, such as Docker, communicate to each other through a shared kernel.  Guest virtual machines communicate to each other through the hypervisor or host operating system.  Containers enjoy faster communication as staying within a shared kernel allows for more rapid communication than leaving a virtual machine and going out to a hypervisor (or host operating system) to communicate with another virtual machine.  Containers allow for sequestration of processes and fewer operating systems licenses compared to having a comparable solution with virtual machines.  Virtual machines can separate processes but require an operating system license for every virtual machine.

DevOps and ETL Quiz

Extract-Transform-Load workflows involve considerable architecture including a workflow over a network to take data from a flat file and ingest it into a database.  Automation is one way to manage the ETL support system.  DevOps Engineers commonly support database installations and configurations.  DevOps engineers commonly support continual delivery pipelines.  This automated process (involving automatic deployments) is often similar to automating an ETL process.  DevOps engineering, build and release engineering, automation development, and ETL design are all interdisciplinary fields of information technology.  This is a quiz related to both DevOps and ETL topics.

1.  What is the DevOps tool for databases?

a.  QuerySurge
b.  Beehive
c.  Stratos
d.  DBMaestro

2.  What does mung mean?

___________________________________________________________________________

3.  What does idempotent mean?

___________________________________________________________________________

4.  What is the name of the process of actively preparing data for serialization (e.g., data that was not otherwise logically contiguous on disk for a buffer) called?  This process may include modifying data from one programming language or interface so it is compatible with a different programming language or different interface.

a. Almquist variation
b. inmoning
c. scrum transition
d. marshalling

5.  How is an imperative process different from a declarative process?

___________________________________________________________________________

6. What is a common tool that both ETL Developers and DevOps Engineers use?

___________________________________________________________________________

7. Which of the following can you not create an AWS Data Pipeline with?

a.  AWS Management Console
b.  AWS Command Line Interface
c.  AWS SDKs
d.  AWS APIs
e.  None of the above

8.  Mesos Clusters cannot work with both HDFS and Digital Ocean?

True
False

9. Hadoop YARN cannot act as a scheduler for OpenShift?

True
False

10.  Which of the following Apache products can create ETL jobs?

a.  Accumulo
b.  Pig
c.  Stanbol
d.  Lucene

11.  Which of the following is not an ETL product?

a.     IBM InfoSphere Datastage
b.     Oracle Warehouse Builder
c.     Business Objects XI
d.     SAS Enterprise ETL server
e.     Stratos
f.     Informatica
g.     Apache Hadoop
h      Talend Big Data Integration

12.  In Informatica are mapplets only able to be used once without logic?

Yes
No

13.  Which of the tools below are tools designed to aide ETL process testing and validating data warehouses themselves?

a.  QuerySurge by Real-Time Technology Solutions
b.  DBMaestro
c.  Apache Cassandra
d.  Apache Stratos
e.  ETL Validator by datagaps inc.

14.  What is an example of cooked data in the context of ETL/Devops?

a.  Machine-corrupted data (e.g., from disk failure)
b.  Content that was corrupted maliciously
c.  Cleansed data
d.  Intentionally masked data (to hide identities)

15.  What is the technique that divides a table of a database into different subcomponents, such as partitioning columns, to improve read and write performance?

a.  data marting
b.  impedance matching
c.  sharding
d.  redis

16.  What tool allows you to designate when Docker containers process ETL jobs without manual configuration?

a.  Pachyderm
b.  Chronos
c.   Overwatch
d.  emerge-sync

17.  Which of the following can readily be used as a superior ETL platform?

a.  Hadoop
b.  Teradata
c.  Proxmor
d.  Note Beak

18.  There is consensus that small companies should use Informatica or a supported, proprietary ETL tool as opposed to an in-house developed tool.

True
False

19.  Which of the following has an open source version:

a.  Talend Integration Suite
b.  Pentaho Kettle Enterprise
c.  CloverETL
d.  All of the above
e.  None of the above

20.  What is a data lake?

a.  A synonym of data warehouse
b.  A buffer of streamed data
c.  An archive of metadata about previous real-time data streams
d.  A pool of unstructured data

21.  What is a data swamp?

a.  A dense data lake
b.  A severely degraded data lake
c.  A synonym of a data warehouse
d.  A pool of unstructured data
e.  An archive of metadata about previous real-time data streams

22.  Snappy is the name of which two concepts?

a.  The REST API for SnapChat
b.  A data compression and decompression library with bindings for several languages
c.  A Linux package management system
d.  An automation scheduler for Informatica
e.  An open source component to migrate SSIS packages to PostgreSQL

23. In a SQL database you have a left table with four rows and a right table with seven rows, what is the highest number of rows that can be returned with an inner join?

a. 0
b. 4
c. 11
d. More than 11

24. Which of the following provide Sqoop based connectors (choose all that apply)?

a. Teradata
c. Talend Open Studio
c. Informatica (modern versions)
d. Pentaho

25. What is a continuous application?

a. The namesake of CA traded on the Nasdaq as CA
b. An application that encompasses data streaming (e.g., ETL processes) from start to finish that adapts itself to the data stream(s) in real-time
c. An application that leverages ETL processing
d. An application receiving continuous integration (or continual integration)
e. An application receiving continuous delivery (or continual delivery)
f. An application receiving continuous deployments (or continual deployments)
g. An application that is always available through fault tolerance and load balancing

26. DevOps expert Gene Kim got his start with a security product called Tripwire, known for its emphasis on changes to files. There is a tool that keeps track of changes to a database. Which product below concerns itself with tracking changes of database schemas?

a. MongoDB
b. DBVersion
c. Databasegit
d. Liquibase

27. Which product enables you to quickly make copies of SQL Server databases for your Test, QA or development environments? Choose the most accurate answer.

a. Canonical's Juju
b. RedGate's SQL Provision
c. Apache Hamster
d. Apache Numa

28. The SQL Server database back ups are not working or you get false positives that your back up solution is successfully backing them up. What solution should you for a practical back up solution?

a. Write you own PowerShell script that backs up the database
b. Implement AlwaysOn Availability Groups
c. Implement RedGate's Toolbelt
d. Implement Apache Impala

29. Which AWS tool can perform ETL jobs? Choose two.

a. DMS (Database Migration Services)
b. DMS (Data Manipulation Service)
c. Glue
d. Cognito
e. Federation

30. Test Kitchen works for which of the following?

a. Chef
b. Terraform
c. PowerShell DSC
d. All of the above

*** See answers to quiz. ***
DevOps Books