How Do You Install Apache Cassandra on Linux SUSE?

Problem scenario
You are running Linux SUSE and want to install Apache Cassandra.  How do you install Apache Cassandra on Linux SUSE?

Solution
Prerequisites
i.  This solution assumes that you have Java installed.  If you need assistance, see this link.

ii.  This solution assumes your server has a total of 2.5 GB of memory (a combination of either virtual memory or RAM).  If you need to create virtual memory (aka swap space), see this article.  (Less that 2.5 GB may work.  We experienced trouble with 0.5 GB of RAM.)  If you want to upgrade your AWS server (and do not mind paying more for RAM because you need the hard drive space to not be allocated to virtual memory), see this posting.  If you want to resize your GCP server (and do not mind paying more for RAM because you need the hard drive space to not be allocated to virtual memory), see this posting.

iii.  This assumes that you have installed Apache Ant.  If you need directions, see this posting.

iv. This solution assumes that you have installed screen (the command line utility), and Git.  To install screen and Git, use this command:  sudo zypper -n install git screen
If you cannot install screen, you will need to create a duplicate terminal instead of using the screen command.   If you cannot install screen, you will skip step 6.b after you do 6.a.  With two terminal sessions, in one terminal you can start one process and allow the logging to stream to the screen.  With the second terminal you can do the steps in #4 in the procedures below.  If you did install screen, follow the directions as they are written.

Procedures
(If you are using non-AWS Linux server you will have to change the "ec2-user" in the script.)

1.  Run this script (e.g., create /tmp/precas.sh and run it with sudo bash /tmp/precas.sh):

usr=ec2-user  # replace "ec2-user" with the username who will run cassandra
cd /var/lib/
git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
mkdir -p /var/log/cassandra/log
mkdir -p /var/lib/cassandra/{data,saved_caches,commitlog}
groupadd cassandra
usermod -aG cassandra $usr
usermod -aG cassandra root
chown -R root:cassandra /var/lib/cassandra/
chown -R root:cassandra /var/log/cassandra/
chmod 777 /var/lib/cassandra/data/
chmod 777 /var/lib/cassandra/saved_caches/
chmod 777 /var/lib/cassandra/commitlog/
chmod 770 /var/log/cassandra/log/
mkdir /var/lib/cassandra/logs
chown $usr:cassandra *  
echo "log4j.appender.R.file=/var/lib/cassandra/log/system.log" >> /var/lib/cassandra/conf/log4j-server.properties

2.  Manually modify cassandra.yaml.  Run this:

sudo vi /var/lib/cassandra/conf/cassandra.yaml

Uncomment these four stanzas; there should be no leading space either (space on far left) for the data... committ... and saved... lines. (There can be a space with the "- /var/lib"...)

data_file_directories:
   - /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

saved_caches_directory: /var/lib/cassandra/saved_caches

Comment out this stanza:

cluster_name: 'Test Cluster'

It will look like this when you are done:

#cluster_name: 'Test Cluster'

Save the changes (with "ZZ").

3.  Run these two commands:

cd /var/lib/cassandra
sudo /usr/local/apache-ant/bin/ant
# "sudo ant release" with no quotes may be necessary

Ignore the output about eclipse-warnings such as "Potential resource leak" or
"[java] 4 problems (4 errors)" or "BUILD FAILED /var/lib/cassandra/build.xml:1820: Java returned: 255."

4.a.  Make sure your user has the ability to write to /var/lib/cassandra/logs/
You may want to run this: sudo chown jdoe /var/lib/cassandra/logs #replace "jdoe" with the user that will start the Cassandra service

4.b. Run "hostname". Make sure the output is associated with 127.0.0.1 in /etc/hosts

4.c. Run these commands:  

screen
./bin/cassandra

4.d. Hold control and tap "a" and then "d".  (Ctrl-a, Ctrl-d.) Now go to step #5. (If you did not install screen or run the screen command, create a duplicate terminal session and skip to step #5 without doing step 4.d.)

5.  Wait 30 seconds, and then run these two commands:  

cd /var/lib/cassandra
./bin/cqlsh

How Do You Write a PHP Web Page That Asks for a SQL Table Name Then Returns the Contents after the User Clicks a Button?

Problem scenario
You want users to be able to enter a table name and then see the content of the table.  How do you create a webpage with a text field and a submit button that will display the content of the table if it exists?

Solution
Overview
This assumes you have the LAPP stack set up on a single server.  To set up the LAPP stack on Ubuntu, run this command without quotes: "sudo apt-get -y install php5 apache2-bin postgresql postgresql-contrib ​php-pgsql"

This example below challenges the users for Postgres credentials via the web page.  You could hard code a username and password in the viewer1.php page.  You could also leave out the fields for a username and password in the first web page (login1.php).  Then the web page would allow for anonymous users to browse to the site and enter a table name.  The database is hard-coded in viewer1.php, but you could change these two files to work with a database that the user enters.

Warning:  This is not secure against SQL injection. This is for informational purposes only.  This could be done in a secure, development environment or laboratory.

1.  In /var/www/html/ create a file named login1.php.  Have this be the contents:

<html>
<body>
<h2> Want to see a table?  Enter your credentials and the table name.</h2>
<form action="viewer1.php" method="post">
UserName: <input type="text" name="username"><br>
Password: <input type="password" name="password"><br>
TableToView: <input type="text" name="table"<br>
<input type="submit">
</form>

</body>
</html>

2.  In /var/www/html/ create a file named viewer1.php.  Replace "circle" below, but otherwise use the following as the contents:

<?php
   $part1 = $_POST["username"];
   $part2 = $_POST["password"];
   $part3 = $_POST["table"];
   $comp = 'user='.$part1.' password='.$part2 ;
   $host        = "host=127.0.0.1";
   $port        = "port=5432";
   $dbname      = "dbname=circle";
   $credentials = $comp;
   $db = pg_connect( "$host $port $dbname $credentials"  );
   $query = 'select * from ' . $part3 . ';';
   $cc = pg_query($query);
/* This code below was copied from http://razorsql.com/articles/postgresql_column_names_values.html  It was modified somewhat.*/
$i = 0;
echo '<html><body><table><tr>';
while ($i < pg_num_fields($cc))
{
        $fieldName = pg_field_name($cc, $i);
        echo '<td>' . $fieldName . '</td>';
        $i = $i + 1;
}
echo '</tr>';
$i = 0;

while ($row = pg_fetch_row($cc))
{
        echo '<tr>';
        $count = count($row);
        $y = 0;
        while ($y < $count)
        {
                $c_row = current($row);
                echo '<td>' . $c_row . '</td>';
                next($row);
                $y = $y + 1;
        }
        echo '</tr>';
        $i = $i + 1;
   }

pg_free_result($cc);
echo '</table></body></html>';
?>

3.  Go to a web browser.  Go to http://x.x.x.x/login1.php.  Enter the credentials and the table name of the database you want to see.

How Do You Optimize Data Warehouse Performance?

Problem scenario
You have a data warehouse that has not been performing as well as you would like.  What are some things that you can do to optimize the performance?

Solution
1.  Schedule processes such as back ups or ETL jobs at different times from hours of peak demand.  Rebuilding indexes can be a good idea or a bad one depending on when it is performed.  Be mindful of when statistics (meta data related to database transactions) are updated.  Transactions to a database involve locks.  In memory constrained situations, granular cells in tables cannot be locked so an "lock escalation" happens.  This involves more rows or even entire tables being locked and inoperable by other processes.  For a primer on avoiding database lock contention, see this external article.

It may make sense to have a time window for certain batch processing to take place.  Ad hoc tasks can happen inside or outside certain windows.

2.  Use indexes that are neither too complex nor too simple. Some indexes in an effort to make them "covered" (hold each row that a given query needs) become too complex for the associated queries.  Queries are not as fast as they otherwise would be.  The indexes should be made for the most demanding (e.g., frequent) queries.  This may mean creating a composite key for the uniqueness of the key.  If the indexes are general purpose, ensure the key is on a unique row.  Data warehouses are usually not subject to regular OLTP activity.  So they are good candidates for indexes.

3.  Ensure that where clauses filter out unnecessary rows.  Some analytical processing does not need to fetch any more rows than a few.  This is obvious, but before investing large amounts of money in optimization, it is worth mentioning.  There can be greater atomicity between queries when greater filtering is performed.

4.  Model the data in a way that makes sense.  Redesigning table schemas is not a small task.  Remember that for data warehouses, depending on how much analysis and ETL activity there is, you should normalize and denormalize as appropriate.  Some tables perform better with reads if they are denormalized.  OLAP presentations can do well with unnecessary tables (e.g., those that are redundant) that are denormalized.  Analysis can use the entire table.  This can make analytical development easier too.  But creating tables with redundant data can add to the disk space.   Normalization is useful if there are many different types of analysis.  OLAP queries may not cause disk contention if they draw data from different tables that are mutually exclusive.  Avoiding concurrency problems is key.  Some data warehouses span multiple disks and thus highly normalized tables are more likely to quickly support independent queries simultaneously.

5.  Use data types that are appropriate and no larger than they need to be.  While some data types or covariant (compatible) with another or contravariant (not compatible), for performance reasons, it is best to be consistent with data types for evaluation and comparison purposes.  "Mixing different data types can cause performance problems, and even if it doesn’t, implicit type conversions during comparisons can create hard-to-find errors. " taken from this external site.

SQL Server 2008 introduced a data type called datetime2.  This was strictly smaller on the disk compared to the previous data type called datatype.  Surprisingly datetime2 could hold greater accuracy of subseconds.  Some database tables use suboptimal data types for the columns because they started as a proof-of-concept or prototype.  Analyze the data types and the business' requirements to accommodate no more than what is needed.  In other words for choosing a column's data type support the longest string that is required but no longer than that.  Find the most simple as possible numeric data type for the business' required level of precision.  Reads and writes to such a column will be faster than if you chose a data type that supported many decimals and negative values when you only needed positive integers.  This practice of choosing smaller (few characters in a string) and simpler data types will lead to the most efficient operations down the road.

6.  Use the NOT NULL constraint on columns when you can.  "It’s harder for MySQL to optimize queries that refer to nullable columns," taken from this external site.

7.  Adjust and possibly reduce the logging*. Some transactions are logged in such detail that the transactions slow things down.  Tools like Change Data Capture can be great, but not for sheer speed.  Transaction logging is more complex than "either a SQL statement is logged or not."  TRUNCATE is not logged the same was as DELETE.  You can eliminate all the rows in a table quickly with the TRUNCATE command.  To learn about the differences between DELETE and TRUNCATE as people find TRUNCATE to be faster, see this external posting

DDL commands can be logged.  You may want to determine if this is necessary.  We recommend  keeping logging settings the same in all environments (from development to production) because parity in the development pipeline is important.

8.  Reconfigure the underlying storage RAID that supports the data warehouse.  Sometimes  redundancy is not necessary given the transaction logging*.  Disk controllers and disk writing algorithms can be bottlenecks for importing data in the warehouse.

9.  Convert the underlying file system to something that is more minimalistic and fast.  Journaled file systems do their own transaction logging.  This is not necessary if the transaction logging the database does is on a different hard disk from the actual database or data warehouse itself.  The closer to bare metal you place the data warehouse on, the greater speed the I/O activity will be.

10.  Leverage views.  These can provide all the rows that a given query or group of queries will need.  Views exist in memory, so they are fast.  This can free up the disk to serve other I/O activity.

11.  Consider upgrading the database engine.  As new versions of databases are released, the database engines can be perform much faster.

12.  Consider using data compression.  This can enhance reads and writes.  It may be appropriate for a data warehouse.  Not every database can supports compression.  This can also reduce the footprint on the disk too.

13.  Real-time replication of the data warehouse may be unnecessary.  Regularly scheduled back ups may be enough.  Sometimes data warehouses are an after-thought and combined with OLTP databases on a given instance.  Depending on your business' needs, replicating other databases may make sense but the data warehouse could be spared.  On the other hand, you want want to balance a load by having a highly-available data warehouse solution in different geographic locations.

14.  Rewrite SQL queries and remember big O notation.  Some SQL statements use O (n^2) computational complexity.  If you do not know what this is, do two things: One, make sure your SQL statements to not do more comparisons and intermediate processing than absolutely necessary.  Two, consider worst case scenarios where large numbers of rows are returned and then compared.  You may need to rewrite logic to reduce the number of rows being compared or manipulated.  Some I.T. professionals prefer to process data with a high level programming language and use databases and data warehouses as basic data store.  If your SQL statements do the processing themselves, you may want to analyze your SQL code carefully.  It may make sense to use Java or Python for the data processing instead of SQL.  If you must stick with rewriting SQL statements, which can be faster than using a traditional, high level programming language, you may want to read this external guide to identifying which SQL statements you will target in your optimization process.

15.  Can data warehouse operations be performed in-memory?  In-memory databases are becoming more common.  Some public clouds have RAM that is available at a low price.  Memory input and output is faster than disk input and output.

16.  Most databases support various types of prioritization.  Designations called "degree of parallelism" specify a dedicated number of CPUs for a process.  Database engines can take hints which can remove a SQL statement from the native optimizer leading to undesirable results.  This is related to scheduling of back ups and reindexing.  But it is a different process altogether.
The trade-offs in prioritizing can be more clear than other optimization techniques.  Keep a global perspective if you are not the only administrator.

17.  Be aware of what is cached and what is not cached in a database.  See this external link about Postgres caching if you have time.  The cache behavior will vary from database system to database system.  Manually manipulating what is cached can help you.  But keep a global perspective as some processes automatically cached and predicted by increasingly sophisticated tools with every new generation of database version.  

*** It is highly recommended to analyze the queries and use other methods above before making hardware upgrades.  ***

18.  Optimize your network.  Using multiple NICs and multiple network paths (through cabling and routers) for the data warehouse server.  This can ensure that there is no TCP/IP bottleneck.  TCP/IP network collisions degrade the performance of a network.  Spikes in bandwidth attributable to business activity or DDoS attacks can leave you susceptible to reduced performance.  You may want to analyze your entire network for optimal performance and reliability.

19.a.  Add more RAM or use faster RAM (with a higher bus speed).  Available memory has a way of getting consumed.  This can enable you to use in-memory databases or keep operations from paging or swapping (writing to disk via virtual memory).  
19.b. In some cases you have enough disk space to warrant increasing the virtual memory at your disposal. To increase the swap space of the operating system, see this link.

20.  Upgrade the CPUs to have more cores, faster speeds (higher GHz), larger caches and faster caches (e.g., L2 instead of L3). If you can afford the higher costs, click here if your data warehouse is running on a server in AWS or click here if the server is in GCP.

21.  Migrate to a faster storage device (e.g., some SANs and NASes perform better than others).

22. From a pragmatic perspective, many SQL databases include data warehouses. If you want advice on optimizing a SQL database, see this posting.

23. At the OS level if you are running Linux, you may want to enable hugepages.

*  SQL Server has Transaction Logs, Oracle Databases have Redo Logs, Postgres has Write-Ahead-Logs, and MariaDB has binary logs.  To learn more about transaction logging, see this Wikipedia article.

How Do You Troubleshoot the Puppet Error “Could not send report: SSL_connect returned=1 errno=0 state=error: certificate verify failed”?

Problem scenario
You try to connect to Puppet master from Puppet agent for the first time (to get the certificate signed).  You run this command:  puppet agent -t -d

But you get this error:  "Error: Could not send report: SSL_connect returned=1 errno=0 state=error:  certificate verify failed:"

What should you do?

Solution
Are you changing the Puppet master for the Puppet agent?  It is acceptable to configure a Puppet agent to communicate with a new Puppet master server.  If that is the what recently happened, or you are desparate to try something new, back up the .pem file on the Puppet agent node.  Find the name of the .pem file.  It will often be like this:

FQDNofPuppetMaster.pem

For the sake of identifying the name, replace "FQDNofPuppetMaster" in the above with the FQDN of the Puppet master server.  It will often be in a directory named "ssl" on the Puppet agent server.  You may want to search for it like this:  sudo find / -name ssl -type d

You may want to back it up to some other directory.  Then delete the original file from the ssl directory.

Now run the Puppet agent process again:  sudo puppet agent -t -d

The error should go away.  A second potential root cause to this problem is that the SSL port was blocked between the Puppet agent and Puppet master server.

How Do You Install Puppet Master on a Debian Linux Server in GCP?

Problem scenario
In Google Cloud Platform you have a Debian Linux server.  You want to install Puppet master on it.  What should you do?

Solution

Prerequisites
We suggest having at least 4.5 GB of memory.  This can be from RAM or a combination of RAM and swap space.  To create 4 GB of /swap/space you can see this posting as a guide, but remember that the link was created for configuring 2 GB of /swap/space.  You will have to double the 2048 number and the other 2 GB numbers.

Procedures
1.a.  Create a file called puppm.sh in /tmp/ (e.g., vi /tmp/puppm.sh).
1.b.  It should have the following as its content:

echo "deb http://ftp.us.debian.org/debian stable main contrib" /etc/apt/sources.list

apt update -y

apt-get -y upgrade init-system-helpers

apt-get -y install ruby-augeas hiera init-system-helpers ruby ruby-deep-merge ruby-shadow facter

curl http://http.us.debian.org/debian/pool/main/i/init-system-helpers/init-system-helpers_1.51_all.deb > /tmp/init-system-helpers_1.51_all.deb

curl http://http.us.debian.org/debian/pool/main/p/puppet/puppet-master_5.4.0-2_all.deb > /tmp/puppet-master_5.4.0-2_all.deb

curl http://http.us.debian.org/debian/pool/main/p/puppet/puppet_5.4.0-2_all.deb > /tmp/puppet_5.4.0-2_all.deb

dpkg -i /tmp/init-system-helpers_1.51_all.deb
dpkg -i /tmp/puppet-master_5.4.0-2_all.deb
dpkg -i /tmp/puppet_5.4.0-2_all.deb

mkdir -p /etc/puppet/code/environments/production/manifests/

2.  Run this command:  sudo bash /tmp/puppm.sh

3. Use "puppet -V" to test it was installed correctly. You are done.

If you want to see how to install Puppet Master on an Ubuntu server in AWS, see this posting.

How Do You Install Apache Cassandra on Ubuntu 16.04?

Problem scenario
You are running Ubuntu 16.x Linux and want to install Apache Cassandra.  How do you install Apache Cassandra on Linux Ubuntu Linux as a single-server configuration?

Solution
These directions will work to deploy Debian 9 Linux too.  They have been tested to work in AWS and GCP.

Prerequisites
i.  This solution assumes that you have installed Apache Ant and Git.  If you need directions installing Ant, see this posting.  To install Git, use this command:  sudo apt-get -y update && sudo apt-get -y install git

ii.  This solution assumes your server has a total of 2.5 GB of memory (a combination of either virtual memory or RAM).  If you need to create virtual memory (aka swap space), see this article.  (Less that 2.5 GB may work.  We experienced trouble with 0.5 GB of RAM.)  If you want to upgrade your AWS server (and do not mind paying more for RAM because you need the hard drive space to not be allocated to virtual memory), see this posting.  If you want to resize your GCP server (and do not mind paying more for RAM because you need the hard drive space to not be allocated to virtual memory), see this posting.

iii.  This solution assumes that you have Java 8 installed.  If you installed Ant with the directions above, you will need to downgrade.  To do all of this run these commands:

sudo apt-get -y remove openjdk-9-jre-headless
sudo apt-get -y update
# If you are running Ubuntu Linux, run this:
sudo apt -y install openjdk-8-jre-headless gcj-4.8-jre-headless 
default-jdk
# If you are running Debian Linux, run this:
sudo apt -y install openjdk-8-jre-headless default-jdk

Procedures
1.  Create this script (e.g., vi /tmp/precas.sh and place the content below):

usr=ubuntu  # replace "ubuntu" with the username who will run cassandra
cd /var/lib/

if [ -d "/var/lib/cassandra" ];
then
  echo "***FAILED TO CLONE git repo***!!!"
  exit 1
else
  echo "if git is installed, cloning the Git repo as normal"
fi

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
mkdir -p /var/log/cassandra/log
mkdir /var/lib/cassandra/{data,saved_caches,commitlog}
groupadd cassandra
usermod -aG cassandra $usr  # replace ec2-user with the user of your choice
usermod -aG cassandra root
chown -R root:cassandra /var/lib/cassandra/
chown -R root:cassandra /var/log/cassandra/
chmod 777 /var/lib/cassandra/data/
chmod 777 /var/lib/cassandra/saved_caches/
chmod 777 /var/lib/cassandra/commitlog/
chmod 770 /var/log/cassandra/log/
mkdir /var/lib/cassandra/logs
chown $usr:cassandra *  
echo "log4j.appender.R.file=/var/lib/cassandra/log/system.log" >> /var/lib/cassandra/conf/log4j-server.properties

2.   Run the script above:  sudo bash /tmp/precas.sh

3.  Manually modify cassandra.yaml.  Run this:  sudo vi /var/lib/cassandra/conf/cassandra.yaml

Uncomment these four non-blank stanzas (amid blank lines within the quotes); there should be no leading space either (space on far left) for the data... committ... and saved... lines. (There can be a space with the "- /var/lib"...)

"
data_file_directories:
   - /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

saved_caches_directory: /var/lib/cassandra/saved_caches
"
Ignore other lines that may be in between these non-blank stanzas above (e.g., blank lines, commented lines, other existing stanzas).

Comment out this stanza:
cluster_name: 'Test Cluster'

It will look like this when you are done:
#cluster_name: 'Test Cluster'

Save the changes (with "ZZ" with no quotes in vi).

4.  Run these two commands:
cd /var/lib/cassandra
sudo /usr/local/apache-ant/bin/ant release
# "sudo ant release" with no quotes may be necessary if an apt command was used to install Ant.

Ignore the output about eclipse-warnings such as "Potential resource leak" or
"[java] 4 problems (4 errors)" or "BUILD FAILED /var/lib/cassandra/build.xml:1820: Java returned: 255."

5.a.  Reboot the server, and go to step 6.  If for some extremely rare reason you cannot reboot the server, do step 5.b.
5.b.  Modify this file (and know that your open source Cassandra installation has been changed without thorough quality assurance):  /var/lib/cassandra/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
sudo vi /var/lib/cassandra/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
Comment out line 390.  It should like this this when you are done:
//     logger.info("Initializing {}.{}", keyspace.getName(), name);

Save the changes.

6.a.  Run these commands: 
screen  # If you were not able to install screen, create a duplicate terminal session and ignore 6.b.
cd /var/lib/cassandra
./bin/cassandra

6.b.  Hold the control button and tap "a" then tap "d".  Ctrl-a and Ctrl-d.

7.  Run these commands from your original terminal session, or if you could not install screen, run these two commands from a second terminal session:

cd /var/lib/cassandra
./bin/cqlsh

How Do You Run an Ansible Playbook to Configure 2 GB of Swap Space on Every Linux Server?

Problem scenario
You want every Linux server to have 2 GB of virtual memory.  You want to transfer a Bash script to each server and run it with sudoer privileges.  How do you transfer a file and execute it as a sudoer user?

Solution
1.  Install Ansible.  If you need directions on how to do this with RHEL, see this posting.  For SUSE, see this posting.

2.  Configure the managed nodes so playbooks can be run.  This typically means that you would need passwordless SSH to be configured between them and the control server (that is the Ansible server).  If you need help with configuring passwordless SSH connectivity, see this posting.  

3.  Configure the user on each server to be a passwordless sudoer.  This means the user can assume sudoer privileges without being prompted for a password.  (This step has nothing to do with SSH.)  If you do not know how, see this posting.

4.  Optional step.  Get a benchmark.  Run this command to see what the current swap space is:

ansible -m shell -a 'free -m' all

Or maybe run these three commands:

ansible -m shell -a 'free -m' all >> /tmp/info.txt
date >> info.txt
echo "Will run a playbook shortly *** >> /tmp/info.txt

5.  Create a .yaml file (e.g., contint.yaml).  It will look like this*:

- hosts: all
  tasks:
  - copy:
      src: /home/cooluser/swapspace.sh
      dest: /home/cooluser/swapspace.sh
      owner: cooluser
      group: cooluser
      mode: 0644

- name: Execute the script
  hosts: all
  remote_user: cooluser
  become: yes
  tasks:
     - name: Execute the script.
       command: bash /home/cooluser/swapspace.sh

6.  In the .yaml file above, change the source (which refers to a directory on the Ansible control server), destination directory (which refers to a path and file on managed nodes, destination servers), owner, group, mode, remote_user, and command to whatever you desire.  Once you are done save the .yaml file.  You will have to create a source file.  This is a regular file separate from the .yaml file itself.  For the content of this file (referred to in the "src" stanza in the .yaml file above with its absolute path and file name), see this posting's step #2; the content of step #2 of that posting, what is called "addswap.sh", is the content you need for this posting's "swapspace.sh."  That link includes a Bash script that will allocate 2 GB of virtual memory (also known as swap space) from the hard drive.  The script works on Debian/Ubuntu, CentOS/RHEL/Fedora, and Linux SUSE distributions.  If you do not change the .yaml file above, place the link's Bash script under the name "swapspace.sh" in /home/cooluser/ on the Ansible server.

7.  Run this command:  ansible-playbook contint.yaml  # substitute "contint" with the name you gave it.  You are now done.

8.  Optional step.  Verify your work.  Run this command to see what the current swap space is:

ansible -m shell -a 'free -m' all

Or maybe run these three commands:

ansible -m shell -a 'free -m' all >> /tmp/info.txt
date >> info.txt
echo "Will run a playbook shortly *** >> /tmp/info.txt

# You can now look at /tmp/info.txt to see your work.

* If you are using a legacy version of Ansible, it will look like this:

- hosts: all
  tasks:
  - copy:
      src: /home/cooluser/swapspace.sh
      dest: /home/cooluser/swapspace.sh
      owner: cooluser
      group: cooluser
      mode: 0644

- name: Execute the script
  hosts: all
  remote_user: cooluser
  sudo: yes
  tasks:
     - name: Execute the script.
       command: bash /home/cooluser/swapspace.sh

How Do You Write a Bash Script to Create Virtual Memory in the Size of 2 GB on a Linux Server?

Problem scenario
You want to dedicate 2 GB of your hard drive to be virtual memory (swap space for memory-intensive applications).  What should you do to script this that will work on any distribution of Linux (e.g., a RedHat derivative including CentOS/RHEL/Fedora, Debian/Ubuntu, or SUSE)?

Solution

Overview
We consider virtual memory to be a hybrid of RAM "and disk space that running processes can use. Swap space is the portion of virtual memory that is on the hard disk, used when RAM is full." The quoted section was taken from StackOverflow.

Warning:  You do not want to run this script more than once.  If there is a problem, you should try to correct it manually with this posting.  This posting modifies the /etc/fstab.  Use at your own risk!  If you want manual directions instead of a script, try this posting.

n.b. This script can work as a Startup script in GCP. When creating an instance in Google Cloud Platform, there is a field for Startup script. You may have to click the link "Management, disks, networking, SSH keys" to see the Startup script text field. The script below has been tested to work with Azure VMs.

Procedures
1.  Create a file called addswap.sh with this command:  vi /tmp/addswap.sh

2.  Place the content below in the addswap.sh file.  Then save it.

distro=$(cat /etc/*-release | grep NAME)

debflag=$(echo $distro | grep -i "ubuntu")
if [ -z "$debflag" ]
then   # If it is not Ubuntu, test if it is Debian.
  debflag=$(echo $distro | grep -i "debian")
  echo "determining Linux distribution..."
else
   echo "You have Ubuntu Linux!"
fi

if [ -z "$debflag" ]
then
   echo "Your Linux is NOT a Debian/Ubuntu derivative"
   echo "You may be using CentOS/RHEL/Fedora or SUSE"
   dd if=/dev/zero of=/mnt/2GB.swap count=2048 bs=1MiB
else
   echo "You are using either Ubuntu Linux or Debian Linux."
   fallocate -l 2G /mnt/2GB.swap
fi

mkswap /mnt/2GB.swap
swapon /mnt/2GB.swap

cp /etc/fstab ~/fstab.bak
cp /etc/sysctl.conf ~/sysctl.conf.bak

echo "/mnt/2GB.swap  none  swap  sw 0  0" >> /etc/fstab
echo "vm.swappiness=10" >> /etc/sysctl.conf

swapon -s
chmod 600 /mnt/2GB.swap

3.  Run the script with this command:  sudo bash /tmp/addswap.sh

With Ubuntu or Debian it is so fast you will not believe it.  Be prepared to wait if you are running a CentOS/RHEL/Fedora or SUSE distribution of Linux.  To see the results of your work, run this:  sudo top | grep Swap

How Do You Install Apache Cassandra on a RedHat Derivative of Linux?

Problem scenario
You want to install Apache Cassandra on CentOS, RHEL, or Fedora Linux.  What do you do?

Solution
These directions will deploy Apache Cassandra in a solo-server configuration.  This will not create a cluster.  These directions will work for a physical server, a VM, an EC-2 instance in AWS or a virtual machine in GCP.

Prerequisites

i.  This assumes that you have 3 GB of memory (in combination of either RAM or virtual memory, aka swap space).  If you have only 1 GB of RAM, add 2 GB of swap space.  See this link for more information on adding swap space.  If you need the hard drive space to not be allocated to virtual memory and you are using a public cloud service and do not mind spending more money, you can add more RAM.  To do this with AWS, see this link.  To do this with GCP, see this link.

ii.  This solution assumes that you have installed screen (the command line utility), Git and Apache Ant.  To install screen and Git, use this command:  sudo yum -y install git screen

If you need directions for Ant, see this posting.  If you cannot install screen, you will need to create a duplicate terminal instead of using the screen command.   If you cannot install screen, you will skip step 6.b after you do 6.a.  With two terminal sessions, in one terminal you can start one process and allow the logging to stream to the screen.  With the second terminal you can do the steps in #7 in the procedures below.  If you did install screen, follow the directions as they are written.

Procedures
1.  Create this script (e.g., vi /tmp/precas.sh and place the content below):

usr=ec2-user  # replace "ec2-user" with the username who will run cassandra
cd /var/lib/

if [ -d "/var/lib/cassandra" ];
then
  echo "***FAILED TO CLONE git repo***!!!"  
  exit 1   # directory already exists, abort bash script
else
  echo "if git is installed, cloning the Git repo as normal"
fi

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
mkdir -p /var/log/cassandra/log
mkdir /var/lib/cassandra/{data,saved_caches,commitlog}
groupadd cassandra
usermod -aG cassandra $usr  # replace ec2-user with the user of your choice
usermod -aG cassandra root
chown -R root:cassandra /var/lib/cassandra/
chown -R root:cassandra /var/log/cassandra/
chmod 777 /var/lib/cassandra/data/
chmod 777 /var/lib/cassandra/saved_caches/
chmod 777 /var/lib/cassandra/commitlog/
chmod 770 /var/log/cassandra/log/
mkdir /var/lib/cassandra/logs
chown $usr:cassandra *  
echo "log4j.appender.R.file=/var/lib/cassandra/log/system.log" >> /var/lib/cassandra/conf/log4j-server.properties

2.   Run the script above:  sudo bash /tmp/precas.sh

3.  Manually modify cassandra.yaml.  Run this:  sudo vi /var/lib/cassandra/conf/cassandra.yaml

Uncomment these four non-blank stanzas (amid blank lines within the quotes); there should be no leading space either (space on far left) for the data... committ... and saved... lines. (There can be a space with the "- /var/lib"...)

"
data_file_directories:
   - /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

saved_caches_directory: /var/lib/cassandra/saved_caches
"

Ignore other lines that may be in between these stanzas above (e.g., blank lines, commented lines, other existing stanzas).

Comment out this stanza:
cluster_name: 'Test Cluster'

It will look like this when you are done:
#cluster_name: 'Test Cluster'

Save the changes (with "ZZ" with no quotes in vi).

4.  Run these two commands:
cd /var/lib/cassandra
sudo /usr/local/apache-ant/bin/ant release
# "sudo ant release" with no quotes may be necessary if ant was installed from an rpm or yum command

Ignore the output about eclipse-warnings such as "Potential resource leak" or
"[java] 4 problems (4 errors)" or "BUILD FAILED /var/lib/cassandra/build.xml:1820: Java returned: 255."

This process may take 10 minutes.

5.a.  Reboot the server (e.g., sudo reboot), and go to step 6.  If for some extremely rare reason you cannot reboot the server, do step 5.b.
5.b.  Modify this file (and know that your open source Cassandra installation has been changed without thorough quality assurance):  /var/lib/cassandra/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
sudo vi /var/lib/cassandra/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
Comment out line 390.  It should like this this when you are done:
//     logger.info("Initializing {}.{}", keyspace.getName(), name);

Save the changes.

6.a.  Run these three commands:
screen
cd /var/lib/cassandra
./bin/cassandra

6.b.  Hold control and tap "a" and then "d".  (Ctrl-a, Ctrl-d.)

7.  Wait 30 seconds, and then run these two commands:

cd /var/lib/cassandra
./bin/cqlsh

How Do You Upload a Docker Image to Amazon Elastic Container Registry?

Problem scenario
You created your own Docker image.  You want to upload it to a repository so it is available to other servers.  You want to upload a Docker image from your server into ECR (in AWS).  You do not know the name of the repository.  How do you do this?

Solution
Prerequisites

a.  This assumes that ECR has been set up.  If you need assistance, see this posting.
b.  This assumes you have your own Docker image configured locally one a server.  If you need assistance, see this posting.
c.  You must have the AWS CLI installed on a Linux server for authorization to your Elastic Container Registry.  If you need assistance with that, see this posting.

Overview
If you do not know the name of the Docker registry and you are using Amazon Elastic Container Registry (i.e., ECR), do the following to obtain the command to login.  You wil be able to push and pull images after you do steps 1 through 3.

Procedures
1.  Draft this command, but replace "us-east-6" with the region that the ECR is in:
(aws ecr get-login --no-include-email --region us-east-6)
2.  Run the above command that you drafted.  The (very long) result should be what you need to log into the ECR, but you may or may not need a "sudo " before that command.  This command should work on other servers besides the server it was originally displayed from.  
3.  The results from the above command are a draft of the next command you will run.  From the server that you want push the Docker image from, run the very long command to log into ECR (which may need a "sudo " before it).  
4.  Tag your Docker image.  Draft a command such as this, but do the replacements as they are described beneath it:

docker tag value1:value2 123456.dkr.ecr.us-west-6.amazonaws.com/reponame:value2

# Replace "value1" with the repository name of the Docker image (as seen in the results of a "docker images" command)
# Replace "value2" with the tag of the Docker image (as seen in the results of a "docker images" command)
# Replace "123456.dkr.ecr.us-west-6.amazonaws.com/reponame" with the ECR Repository URI (as seen in the results of a "aws ecr describe-repositories" command)

 5.  Run the command drafted above (after you have done the substitutions).

6.  Draft and execute a "docker push" command.  Remember to specify a destination Docker registry in the pull command.  It will look something like this: 
docker push 123456.dkr.ecr.us-west-6.amazonaws.com/reponame:value2

# Replace "value2" with the tag of the Docker image (as seen in the results of a "docker images" command)
# Replace "123456.dkr.ecr.us-west-6.amazonaws.com/reponame" with the ECR Repository URI (as seen in the results of a "aws ecr describe-repositories" command)

7.  You are done.  From another server properly configured (that has gone through steps 1 through 3 above), you can run the command in step #6 with "pull" instead of "push" to copy down the image to another Docker host.