Problem scenario
You are running Debian or Ubuntu Linux.You want to use pip to work with Python packages. What do you do?
Solution
Run these two commands:
sudo apt-get -y update
sudo apt-get -y install python-pip
A Technical I.T./DevOps Blog
Problem scenario
You are running Debian or Ubuntu Linux.You want to use pip to work with Python packages. What do you do?
Solution
Run these two commands:
sudo apt-get -y update
sudo apt-get -y install python-pip
Problem scenario
You run an Ansible playbook on a server as a remote user that is a sudoer on that same server. But you get this error when you run the playbook: "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}" How do you troubleshoot this error?
You want to run Bash scripts via Ansible playbooks. But these scripts will install packages and modify sensitive files. These scripts need to be run as a sudoer. How do you run Ansible playbooks on managed nodes that run shell scripts that require sudo?
Solution
Root cause
A non-Ansible component is the root cause. The individual servers that are the managed nodes need to allow for a user to assume sudoer privileges without prompting for a password.
Procedures
You will need a user that is a sudoer that does not get prompted for a password when the user runs commands with sudo. Create this user. For this example we will call it "cooluser". You may substitute "cooluser" with "jdoe" or whichever user will be the one to configured to run the Ansible playbooks.
1. Run this command: sudo visudo
2. Inside this file that you will be able to modify, find these two lines:
## Same thing without a password
# %wheel ALL=(ALL) NOPASSWD: ALL
3. Enter this stanza (but replace "cooluser" with the username that will run the shell scripts) underneath the above lines:cooluser ALL=(ALL) NOPASSWD:ALL
4. Save the changes. Use shift-z, shift-z (e.g., "ZZ" with no quotes) to save the changes.
If you want to see how to write a sample playbook, see the article "How Do You Run an Ansible Playbook to Configure 2 GB of Swap Space on Every Linux Server?"
Problem scenario
You want to use pip commands for Python packages, and you are using a SUSE server. You want to install pip on a Linux SUSE server.
Solution
Run this command: sudo zypper -n install python-pip
Problem scenario
You have an EC-2 instance running RedHat Enterprise Linux. You want to use the Python SDK for AWS called Boto. What should you do?
Solution
Prerequisites
Install pip. See this link if you do not know how.
Procedure
1. Run this command: sudo pip install boto3
2. You are done. The rest is optional.
3. You may want to install and configure the AWS CLI. But this is optional if you place the AWS access ID and secret access key in the Python programs themselves. For directions on installing and configuring the AWS CLI, see this posting.
4. Here is a sample program if you do not have AWS CLI installed and configured. This program merely lists S3 buckets that have been created. It is not destructive.
# Usage instructions
# 1. replace "aaa111" with the aws_access_key_id of your account
# 2. replace "bbb222" with the aws_secret_access_key of your account
# If you are not sure how to find these credentials do the following:
# To find the AWS Access Key ID and AWS Secret Access Key, in the AWS console, click on your name in the upper right hand corner.
# Then click on "My Security Credentials." Click "Create New Access Key" Click "Show Access Key."
# To run it, call it test3.py and run it with a command like this: "python test3.py"
import boto3
s3 = boto3.resource('s3', aws_access_key_id='aaa111', aws_secret_access_key='bbb222')
for bucket in s3.buckets.all():
print(bucket.name)
5. As a reference, you may go to this external link to learn more about Boto's features.
The Elastic Stack used to be called the ELK Stack; this link provides more information.
Applied Network Security Monitoring: Collection, Detection, and Analysis by Chris Sanders and Jason Smith
The Art of Monitoring by James Turnbull
ElasticSearch 5.0 Cookbook - Third Edition by Alberto Paro
Elasticsearch Blueprints by Vineeth Mohan
Elasticsearch: A Complete Guide by Bharvi Dixit, Rafal Kuc, Marek Rogozinski and Saurabh Chhajed
ElasticSearch Cookbook, Second Edition by Alberto Paro
Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine by Clinton Gormley and Zachary Tong
Elasticsearch Essentials by Bharvi Dixit
Elasticsearch for Hadoop by Vishal Shukla
Elasticsearch in Action by Radu Gheorghe, Matthew Lee Hinman and Roy Russo
ElasticSearch Indexing by Huseyin Akdogan
ElasticSearch Quick Start: An introduction to ElasticSearch in tutorial form. by Joel Abrahamsson
Elasticsearch Server - Third Edition by Rafal Kuc and Marek Rogozinski
Kibana Essentials by Yuvraj Gupta
Learning Elasticsearch by Abhishek Andhavarapu
Learning ELK Stack by Saurabh Chhajed
Learning Kibana 5.0 by Bahaaldine Azarmi
The Logstash Book by James Turnbull
Mastering ElasticSearch 5.0 - Third Edition by Bharvi Dixit
Mastering Elasticsearch, Second Edition by Rafal Kuc and Marek Rogozinski
Monitoring ElasticSearch by Dan Noble
NoSQL Injection for Elasticsearch by Gary Drocella
Relevant Search: With applications for Solr and Elasticsearch by Doug Turnbull and John Berryman
We worked very hard to write an original twelve-question Big Data Quiz. Please do not expect another posting (like we normally post) for several days. Please be sure to check out the Big Data Quiz!
Problem scenario
An IP address is pingable from one Linux server. On this server there are no Docker containers running. A traceroute reveals that this IP address is one hop away from your server. How do you find out what this IP address belongs to?
Solution
Use this command: ip addr show
You may also want to use this command: route -n
For a more thorough guide to troubleshooting network problems, see this posting.
Big Data Quiz
1. What does EDH stand for?
a. Enterprise Data Hub
b. Extract Develop Hadoop
c. Extract Decide Haul
d. Extract Data Hadoop
2. Gartner, Informatica and MapR think "data lakes" should be referred to as what?
a. data warehouses
b. data dams
c. data mills
d. data reservoirs
3. MapReduce is to Hadoop as ___________ is to Spark
a. Storm
b. Vertice Algorithm
c. Directed Acyclic Graph
d. RDD
e. Memory
4. RDD stands for what in Spark?
a. Really Different Data
b. Resilient Distributed Dataset
c. Real Developed Data
d. Reliable Data Distribution
5. Which three file systems are recommended to be used with HDFS on top?
a. cifs
b. ext3
c. ext4
d. gfs
e. hfs
f. JFS
g. nfs
h. reiserfs
i. vfat
j. XFS
6. If a Hadoop cluster had nodes that cost $15,000 each, would an HP Vertica or a Teradata solution cost more or less? Choose two.
a. HP Vertica would be cheaper
b. HP Vertica would be more expensive
c. Teradata would be cheaper
d. Teradata would be more expensive
7. What is "a scalable and fault-tolerant stream processing engine built on the Spark SQL engine."?
a. Structured streaming
b. Beam
c. Continual application
d. Storm
8. What is a framework that allows you to implement streaming and batch data processing jobs that can run on any execution engine?
a. Apache Apex
b. Apache Beam
c. Apache Cassandra
d. Apache Flink
e. Apache Storm
9. Which of the following does not need Hadoop (choose two)?
a. Apache Apex
b. Apache Flink
c. Apache Spark
d. Apache Tez
10. Which of the following is a "Hadoop YARN native platform" (thus dependent on Hadoop) and a type of "unified stream and batch processing engine"?
a. Apache Apex
b. Apache Beam
c. Apache Cassandra
d. Apache Delta
e. Apache Flink
11. What company provides a commercial version of Apache Spark that was founded by the people who invented Apache Spark?
a. Data Pipeline Gurus, LLC
b. Databricks
c. Hotfire Software
d. Zephyr Data
12. What is Microsoft's version of Hadoop?
a. MS Knowledge
b. BigTable
c. HDInsight
d. Datica
e. Kinesis
13. What are examples of a Directed Acyclic Graph?
a. A typical ETL process
b. The npm package manager
c. YARN
d. Spark operating on RDDs via stages which involves sub-tasks
e. Apache Airflow's pythonic schedule of phases for dynamic processing
f. All of the above
g. None of the above
14. At what stage in the MapReduce process does the "shuffle" phase happen?
a. Before the map stage
b. After the map stage and before the reduce stage
c. After the reduce stage
d. None of the above
15. How does Hadoop support high availability for your name node?
a. Via the secondary namenode
b. A standby namenode only in proprietary Hadoop versions
c. A standby namenode in open source or proprietary Hadoop versions
d. N/A. There is no native Hadoop support for highly available namenodes
For answers, see this posting.
Problem scenario
You have Ansible installed as a control server (a centralized server to push down configurations to other servers). You want to use it to manage another server. How do you configure the other server to be a managed node? In other words, how do you configure Ansible to push configuration changes down to servers?
Solution
Prerequisite
Ansible must be installed. If you need directions, see this posting if you are using RHEL. If you are using Linux SUSE and need directions for installing Ansible, see this posting.
Procedures
1. Configure passwordless SSH from the Ansible server to the managed node. If you are not sure how to do this, see this posting.
2. On the Ansible control server (not the managed node), modify the /etc/ansible/hosts file. (This directory path would normally never exist on a managed node. Only a server with Ansible would have /etc/ansible/.) If the directory /etc/ansible/ does not exist on the Ansible server, create it. Add these two lines to the hosts file in /etc/ansible/ but substitute x.x.x.x for the internal IP address of the managed node (change "group_name" to whatever you like as well as "alias" to whatever you like):
[group_name]
alias ansible_ssh_host=x.x.x.x
Each word in the first position (where the word "alias" is) should be unique. The "#" works as a comment in this file. You may comment out lines as you architect your environment.
3. Run this command as a test: ansible -m ping all
4. To be even more sure, run this command: ansible -m shell -a 'free -m' all
Problem scenario
You want to install Jenkins on an Ubuntu Linux server without running an "apt-get upgrade" command. (You are ok with installing a "apt-transport-https" package.) What should you do?
Solution
See this posting because the directions work for Ubuntu in AWS and Debian in GCP.
(If you can run "apt-get upgrade", and you do not want to install "apt-transport-https", then see this posting to install Jenkins 2.x on Ubuntu Linux.)