How Do You Troubleshoot Error Messages in a Bash Script?

Problem scenario
You run a shell or bash script but you receive an error message such as one of the following:

$'\r': command not found
line:55 syntax error: unexpected end of file

Solution
Did you download the file from the internet directly to the Unix system? The root cause could be that there are invisible characters in the file.

If you can copy the content of the text into the Linux machine and have it run as a script, the invisible characters are to blame. The integration between Windows and Linux can have hitches.

To eliminate these extra characters which interfere with the Bash script's execution, run the following (but substitute "script.sh" with the name of your script):

perl -p -i -e "s/\r//g" script.sh

How Do You Delete VM Instances from GCP That Pertain to GKE?

Problem Scenario
You have some GKE standard clusters that you want deleted. You try to delete them, but they do not go away. What should you do?

Solution
Root cause: Kubernetes clusters are self-healing. They are acting as they were designed. Once the instance group is deleted, you will be able to delete the instances via the web UI or via Google's Cloud Shell.

Procedures
See this link How Do You Delete VM Instances from GCP That Pertain to GKE?

Why Would a MongoDB Container Work on an Ubuntu Host but Not a SUSE Host?

Problem scenario
You have two Linux servers in AWS, and each one has the same flavor (i.e., same amount of RAM and same number of processors). One is Ubuntu and another is SUSE. These commands will create working containers in Ubuntu:

docker run --name name1-mongo -d mongo
docker run --name name2-mongo -d mongo:2

This command will not create a working container in SUSE:

docker run --name name2-mongo -d mongo:2

The container that is created will never start. Its status is "Exited." What is wrong?

Solution
SUSE and Ubuntu may have the same amount of processing power and RAM available. However if you have only one processor, the containers may not perform correctly. Try resizing the SUSE VM so you have more than one processor. If your server is in AWS, see this link for directions on how to resize it. If your server is in GCP, see this posting on how to resize it.

How Do You Get Azure PowerShell Commands Involving Storage to Work when They Are Returning an Error “not recognized”?

Problem scenario
Many Azure commands are failing. You installed the Azure module. But commands related to "Azure Storage" all return "not recognized as the name of a cmdlet, function, script file or operable program."

You see messages like this:

New-AzureStorageAccount : A parameter cannot be found that matches parameter name 'ResourceGroupName'.
At line:4 char:25
+ New-AzureStorageAccount -ResourceGroupName "contIntGroup" -AccountNam ...
+                         ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [New-AzureStorageAccount], ParameterBindingException
    + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.WindowsAzure.Commands.ServiceManagement.StorageServices.NewAzureStorageAccountCommand
 
Get-AzureStorageAccountKey : The term 'Get-AzureStorageAccountKey' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the 
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:5 char:16
+ $accountKey = (Get-AzureStorageAccountKey -ResourceGroupName contIntG ...
+                ~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (Get-AzureStorageAccountKey:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
 
New-AzureStorageContext : Cannot validate argument on parameter 'StorageAccountKey'. The argument is null or empty. Provide an argument that is not null or empty, and 
then try the command again.
At line:6 char:89
+ ... text -StorageAccountName $storageName -StorageAccountKey  $accountKey
+                                                               ~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (:) [New-AzureStorageContext], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.WindowsAzure.Commands.Storage.Common.Cmdlet.NewAzureStorageContext
 
New-AzureStorageContainer : Could not get the storage context.  Please pass in a storage context or set the current storage context.
At line:7 char:1
+ New-AzureStorageContainer -Name "templates" -Context $context -Permis ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : CloseError: (:) [New-AzureStorageContainer], InvalidOperationException
    + FullyQualifiedErrorId : InvalidOperationException,Microsoft.WindowsAzure.Commands.Storage.Blob.Cmdlet.NewAzureStorageContainerCommand
 
Set-AzureStorageBlobContent : Could not get the storage context.  Please pass in a storage context or set the current storage context.
At line:9 char:1
+ Set-AzureStorageBlobContent -File "C:\Users\contint\Documents\WindowsPo ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : CloseError: (:) [Set-AzureStorageBlobContent], InvalidOperationException
    + FullyQualifiedErrorId : InvalidOperationException,Microsoft.WindowsAzure.Commands.Storage.Blob.SetAzureBlobContentCommand
 
Set-AzureStorageBlobContent : Could not get the storage context.  Please pass in a storage context or set the current storage context.
At line:10 char:1
+ Set-AzureStorageBlobContent -File "C:\Users\contint\Documents\WindowsPo ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : CloseError: (:) [Set-AzureStorageBlobContent], InvalidOperationException
    + FullyQualifiedErrorId : InvalidOperationException,Microsoft.WindowsAzure.Commands.Storage.Blob.SetAzureBlobContentCommand
 
New-AzureResourceGroupDeployment : The term 'New-AzureResourceGroupDeployment' is not recognized as the name of a cmdlet, function, script file, or operable program. 
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:14 char:1
+ New-AzureResourceGroupDeployment -Name "contintDeployment" -ResourceG ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (New-AzureResourceGroupDeployment:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

What should you do?

Solution

  1. Open PowerShell as administrator.
  2. Run this: install-module AzureRM.storage
  3. If you have been running anti-malware applications on your desktop, and your laptop is reasonably secure, you can choose the "Y" option to the prompt about an untrusted repository and proceeding to install the modules from "PSGallery".
  4. You are done. Now the commands should work.

How Do You Create a Jenkins CI/CD Pipeline That Starts when Code is Checked into Atlassian Bitbucket?

Problem scenario
To trigger a Jenkins build after code is inserted into a Bitbucket repository, I have two options. I can use a Bitbucket plugin in Jenkins and do the configuration there. Or I can create a webhook in Bitbucket.

I tried both with a great deal of troubleshooting. But neither worked, and there was no error message. I found nothing in the logs. The triggering event clearly happened.

What might be the cause? Is there a third method for doing this?

Possible Solution #1
Is there an intermediate firewall blocking Bitbucket from communicating with Jenkins?

Possible Solution #2
In Bitbucket, are Post-Receive Webhooks enabled? In Settings of a Bitbucket repository there is a Workflow section. Under "Workflow" there should be a "Hooks" section. Post-Receive Webhooks must be enabled.

DevOps and ETL Quiz Answers

DevOps ETL Quiz Answers

1.  What is the devops tool for databases?

a.  QuerySurge
b.  Beehive
c.  Stratos
d.  DBMaestro

Answer: D. DBMaestro

2.  What does mung mean?

Answer:  "Mash until no good."  It refers to a permanent operation of data wherein changes will make the data irretrievable.  An example would be to remove the commas from a .csv file.  With no substituted symbol, it will not be trivial to restore the file to its original location.  Puppet error messages use the word "mung."  Data cleansing in ETL operations can lead to files being unrestorable.

3.  What does idempotent mean?

Answer:  It is an adjective describing an action to have no cumulative affect.  For example, running an idempotent operation one time will be the same as running it ten times.  Sometimes ETL operations are performed multiple times.  If the job has idempotency, no harm can be done in running it again.  In the DevOps world, whether or not a configuration management tool operation is idempotent will determine if DevOps engineer must be mindful of inadvertent accumulation of multiple runs of the operation itself.  

Appending a line to an /etc/fstab (a file system table file) may be desirable one time, but additional appends could be undesirable.  Such an operation is not idempotent because the appending is cumulative.  Enforcing a configuration in an /etc/hosts file (with erasure and substitution) may happen one million times with no negative consequences.  Such an operation is idempotent because there is erasure.

4.  What is the name of the process of actively preparing data for serialization (e.g., data that was not otherwise logically contiguous on disk for a buffer) called?  This process may include modifying data from one programming language or interface so it is compatible with a different programming language or different interface.

a. Almquist variation
b. inmoning
c. scrum transition
d. marshalling

Answer: D. http://whatis.techtarget.com/definition/marshalling

5.  How is an imperative process different from a declarative process?

Answer:  ETL processes are almost always imperative.  Imperative processes can involve math and be procedurally intensive.  An imperative ETL process would involve data being cleansed, organized, filtered, extracted, transformed and loaded into a database.  Imperative configurations rely on how a given server is found and involve an order of execution.  The starting point is relevant to the configuration process and programming with intermediate steps may be part of an imperative process.

Declarative processes usually avoid math and computation.  Many configuration management tools have desired states.  The end configuration is declared and idempotent operations make changes are made on destination servers.  Declarative configurations do not ordinarily rely on how a given server is configured.  A declarative process may map an object to its ultimate state.  Some ETL solutions are advertised as being declarative; in 2016, it seems impossible that the mapping could truly be ETL (extract, transform, and load) and declarative.

6. What is a common tool that both ETL Developers and DevOps Engineers use?

Answer:
In Linux, they use the crontab and Bash.

In Windows, they use Task Scheduler and PowerShell.

Jenkins can be a tool for DevOps Engineer and ETL Developers (scheduling and monitoring jobs).  Jenkins is more commonly associated with Build Engineering or Release Engineering.   

7. Which of the following can you not create an AWS Data Pipeline with?

a.  AWS Management Console
b.  AWS Command Line Interface
c.  AWS SDKs
d.  AWS APIs
e.  None of the above

Answer: E.  Taken from http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

8.  Mesos Clusters cannot work with both HDFS and Digital Ocean?

True
False

Answer: False.  See this link for more info:  https://www.digitalocean.com/customers/rockerbox/

9. Hadoop YARN cannot act as a scheduler for OpenShift?

True
False

Answer:  False.  See this link. http://hortonworks.com/blog/openshift-v3-kubernetes-docker-hadoop-yarn/

10.  Which of the following Apache products can create ETL jobs?

a.  Accumulo
b.  Pig
c.  Stanbol
d.  Lucene

Answer: B.  See this link: http://hortonworks.com/apache/pig/

11.  Which of the following is not an ETL product?

a.     IBM InfoSphere Datastage
b.     Oracle Warehouse Builder
c.     Business Objects XI
d.     SAS Enterprise ETL server
e.     Stratos
f.     Informatica
g.     Apache Hadoop
h      Talend Big Data Integration

Answer: E.  Apache Stratos is not an ETL product.  See this for more info about Hadoop:https://www.datanami.com/2014/09/01/five-steps-to-running-etl-on-hadoop-for-web-companies/

12.  In Informatica are mapplets only able to be used once without logic?

Yes
No

Answer:  No. "Mapplets are reusable objects with transformations and logic very similar to a traditional mapping."  Taken from http://www.howtointegratedata.com/mapplet-in-informatica/ (as of 4/8/20, this link no longer works)

13.  Which of the tools below are designed to aid ETL process testing and validating data warehouses themselves?

a.  QuerySurge by Real-Time Technology Solutions
b.  DBMaestro
c.  Apache Cassandra
d.  Apache Stratos
e.  ETL Validator by datagaps inc.

Answer: A. QuerySurge.  To see more info, see this link:
https://www.mysql.com/why-mysql/case-studies/rtts-querysurge-mysql-embedded.html

14.  What is an example of cooked data in the context of ETL/Devops?

a.  Machine-corrupted data (e.g., from disk failure)
b.  Content that was corrupted maliciously
c.  Cleansed data
d.  Intentionally masked data (to hide identities)

Answer: C.  http://searchdatamanagement.techtarget.com/definition/raw-data

15.  What is the technique that divides a table of a database into different subcomponents, such as partitioning columns, to improve read and write performance?

a.  data marting
b.  impedance matching
c.  sharding
d.  redis

Answer: C.  http://searchcloudcomputing.techtarget.com/definition/sharding

16.  What tool allows you to designate when Docker containers process ETL jobs without manual configuration?

a.  Pachyderm
b.  Chronos
c.   Overwatch
d.  emerge-sync

Answer: B.  For more information, see http://www.midvision.com/blog/5-must-see-docker-big-data-use-cases-that-show-dockers-processing-power

17.  Which of the following can readily be used as a superior ETL platform?

a.  Hadoop
b.  Teradata
c.  Proxmor
d.  Note Beak

Answer:  B.  The source that had more information was http://devops.sys-con.com/node/2079447 (but as of 4/8/20 it no longer works).

18.  There is consensus that small companies should use Informatica or a supported, proprietary ETL tool as opposed to an in-house developed tool.

True
False

Answer:  False.  For more information, see https://oraclesponge.wordpress.com/2006/12/20/a-list-ten-reasons-not-to-use-an-external-etl-tool/

19.  Which of the following has an open source version:

a.  Talend Integration Suite
b.  Pentaho Kettle Enterprise
c.  CloverETL
d.  All of the above
e.  None of the above

Answer: D.  https://adeptia.com/products/etl_vendor_comparison.html

20.  What is a data lake?

a) A synonym of data warehouse
b) A buffer of streamed data
c) An archive of metadata about previous real-time data streams
d) A pool of unstructured data

Answer: D.  See this link or this link for more information.

21.  What is a data swamp?

a.  A dense data lake
b.  A severely degraded data lake
c.  A synonym of a data warehouse
d.  A pool of unstructured data
e.  An archive of metadata about previous real-time data streams

Answer: B.  See this link for more information.

22.  Snappy is the name of which two concepts?

a.  The REST API for SnapChat
b.  A data compression and decompression library with bindings for several languages
c.  A Linux package management system
d.  An automation scheduler for Informatica
e.  An open source component to migrate SSIS packages to PostgreSQL

Answer:  B and C.  For B, this link discusses python-snappy (something related to [Python] development).  For B, this link delves into compression and decompression for the same tool which could pertain to an ETL task.  For C, this link explains how systems administrators could use a different tool called Snappy (not related to the one in B).

23. In a SQL database you have a left table with four rows and a right table with seven rows, what is the highest number of rows that can be returned with an inner join?

a. 0
b. 4
c. 11
d. More than 11

Answer: D. If the join is on a keys that do not have a unique constraint, the inner join could produce more than 11 rows. You can try this as a demonstration.

24. Which of the following provide Sqoop based connectors (choose all that apply)?

a. Teradata
b. Talend Open Studio
c. Informatica (modern versions)
d. Pentaho

Answer: A, B, C, D
To see the reason behind each option, click on the link corresponding to the letters:
A.
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_teradata-connector-user-guide/content/ch_HortonworksConnectorForTeradata.html
B.
https://www.talend.com/resource/sqoop/
C.
https://kb.informatica.com/howto/6/Pages/19/500711.aspx
D.
https://wiki.pentaho.com/display/EAI/Sqoop+Import

25. What is a continuous application?

a. The namesake of CA traded on the Nasdaq as CA
b. An application that encompasses data streaming (e.g., ETL processes) from start to finish that adapts itself to the data stream(s) in real-time
c. An application that leverages ETL processing
d. An application receiving continuous integration (or continual integration)
e. An application receiving continuous delivery (or continual delivery)
f. An application receiving continuous deployments (or continual deployments)
g. An application that is always available through fault tolerance and load balancing

Answer: B. See this link for more information.

26. DevOps expert Gene Kim got his start with a security product called Tripwire, known for its emphasis on changes to files. There is a tool that keeps track of changes to a database. Which product below concerns itself with tracking changes of database schemas?

a. MongoDB
b. DBVersion
c. Databasegit
d. Liquibase

Answer: D. To learn more, see this external site.

27. Which product enables you to quickly make copies of SQL Server databases for your Test, QA or development environments? Choose the most accurate answer.

a. Canonical's Juju
b. RedGate's SQL Provision
c. Apache Hamster
d. Apache Numa

Answer: B. To read more, see this external site.

28. The SQL Server database back ups are not working or you get false positives that your back up solution is successfully backing them up. What solution should you for a practical back up solution?

a. Write you own PowerShell script that backs up the database
b. Implement AlwaysOn Availability Groups
c. Implement RedGate's Toolbelt
d. Implement Apache Impala

Answer: C. You might not want to implement live clustering depending on your needs and budget. Plus for backups you want a manageable solution for the files to put on to tape drive or to store in Amazon Glacier. So we believe the canonical answer to this question is C. You could write your own PowerShell scripts, but this would not be supported; it could be very involved to make sure it was reliable. To read more about RedGate's Toolbelt, see this page.

29. Which AWS tool can perform ETL jobs? Choose two.

a. DMS (Database Migration Services)
b. DMS (Data Manipulation Service)
c. Glue
d. Cognito
e. Federation

Answer: A and C.
The source of A is page 7 of https://d0.awsstatic.com/whitepapers/Migration/migrating-applications-to-aws.pdf
The source of C is https://aws.amazon.com/glue/

30. Test Kitchen works for which of the following?

a. Chef
b. Terraform
c. PowerShell DSC
d. All of the above

Answer: D. Sources:


DevOps Books


Python Quiz Answers

1.  What is an iterator in Python?

A)  A stream of data that is manipulated or interacted with as an object.
B)  A function that returns a namespace.
C)  A module of nested objects.
D)  A function that returns packages.

Answer: A. For more information, see the Python.org glossary.

2.  Which module in Python allows you to translate strings to and from binary formats?

A)  Marshal
B)  Shelve
C)  Pickle
D)  DMAC

Answer: A. For more information, see the official Python.org site here.

3.  Which of the following is a Rich Internet Application toolkit?

A) binascii
B) shelve
C) pyjamas
D) alglib

Answer:  C. For more information, see page 362 of Programming Python: Powerful Object-Oriented Programming(4th Edition) by Mark Lutz, published by O'Reilly in 2011.

4.  Which of the following provides an interface to AWS?

A) binascii
B) botocore
C) sndhdr
D) xdrlib

Answer: B.  For more information, see this external link.

5.  How does Python store an error?

A)  As a static variable inside the interpreter
B)  It normally uses an operating system environmental variable.  But if there were too many arguments, it writes to a buffer outside the interpreter.
C)  Inside a pseudo class file (.pyc) in /tmp/
D)  It raises the error to the exception logger outside of the interpreter
E)  In the internal sqlite database

Answer: A. For more information, see the official Python explanation.

6.  What does the yield keyword do in Python?

A)  It is a CPython mechanism to synchronize threads.
B)  A reserved word to control the flow of execution to support conditional logic with Python generators.
C)  A reserved word that pauses a function from parsing named tuples.
D)  It sets null points to evaluate as zeroes in arithmetic operations.

Answer: B.  For more information, see the Python.org glossary.

7.  Which of the following can allow for non-destructive testing of whether an exception has been set?

A)  pyyaml and pypy modules
B)  PyErr_Config()
C)  PyErr_Clear()
D)  PyErr_Occurred()

Answer: D. For more information, see this official Python document for an explanation.

8.  If you are receiving an error with a Python program that attempts to connect to a network resource with SSL, there is a way to avoid an error.  This error is '"SSL: CERTIFICATE_VERIFY_FAILED" Error'
One workaround involves these two lines of Python code:

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

Which of the following most nearly addresses the above two-line solution:

A)  The solution makes the Python program more secure.
B)  The solution makes use of the sys library.
C)  The solution is very inadvisable.
D)  The solution would never work.

Answer:  C.  For more information, see this posting.

9.  What is a generator in Python?

A)  Any class that is a factory design pattern.
B)  A reserved word that is a parent class of all iterators in the program.
C)  An object that controls the CPU of the Python Just-In-Time compiler.
D)  A function that has a yield statement and returns an iterator.

Answer:  D.  For more information, see this external site.

10.  Which two of the following add to thread safety in Python (so different threads do not modify data to have unexpected results)?

A)  metaclass
B)  global interpreter lock
C)  trash collection
D)  overwatch
E)  duck-typing
F)  lbyl
G)  lambda
H)  list comprehension
I)  Pythonic sequence
J)  object slice

Answer:  A) and B)  For more information, see this python.org link.

11.  When trying to install pycrypto you run this:

python setup.py build

and you receive an error like this "warning: GMP or MPIR library not found; Not building Crypto.PublicKey._fastmath." what does it mean?

A) You cannot proceed with installing pycrypto; the installation has been aborted.
B) You can proceed with installing pycrypto; the installation may still work.
C) You have a Python Fabric vulnerability
D) A German edition of pycrypto was already installed and you may or may not be able to proceed.

Answer: B.  You may move on to "python setup.py install".  To be on the safe side, you may want to investigate why the error is happening.  Here is a related external link.

12.  How do you find the version of Tornado (python-tornado) on a RedHat server?

A)  python
>>> print python-tornado.version_info

B)  python
>>> import tornado
>>> print tornado.version_info

C)  which python-tornado

D)  python-tornado --version

Answer:  B

13.  What does GIL stand for?

A)  Gears Interpreting Language
B)  Good Invention Language
C)  Global Interpreter Lock
D)  Generate Interprocess Loquitur
E)  Global Instant Lookup

Answer:  C. Source is page 1564 of Programming Python: Powerful Object-Oriented Programming by Mark Lutz.  Published by O'Reilly.

14.  Where does the "kw" come from or mean in the **kwargs you see in Python error messages and/or code?

A)  kilowatt (wildcard kilowatt arguments)
B)  You hear stars on the radio.  Radio stations on the West Coast of the U.S. traditionally have call signs that start with the letter "k," and radio stations on the East Coast have call signs that start with the letter "w."  For a radio button to appear in a GUI written in Python, there needs to be arguments. 
C)  keyword
D)  keep working

Answer: C.  **args is for iterable objects (e.g., a list).  **kwargs is for key-worded pairs (e.g., a dictionary).  For more information see this external link.

15. How many different directory locations does the "import" command look to for a .py file when invoked?

A) 0
B) 1
C) 2
D) Often several but it depends

Answer: D. For more information, see this posting.

16. True or False? A function in Python has to have a return statement.

Answer: False. If you want more information, see this posting.

17. What is a common way (as of 2019) to start a new thread in Python assuming the proper module has been imported? Choose two.

A) nameofthread = Thread(nameoffunction)
B) nameofthread = start_new(nameoffucntion)
C) thread.start_new(nameoffunction)
D) thread.start_new_thread(nameoffunction)
E) nameofthread = newthread()

Answer: A and D.

We found this works:

from threading import Thread

def contint():
  print("Hello!")

if __name__ == "__main__":
    thread = Thread(contint())
    thread.start()
    thread.join()
    print("thread finished...exiting")

We found this works:

import _thread as thread 
def contint():
   print("Hello!")
if name == "main":
     foobar = thread.start_new_thread(contint, () )
     print("thread finished…exiting")

18. What percentage of data types and classes in Python are objects?

A) 0%
B) 25%
C) 50%
D) 75%
E) 100%

Answer: E. The source of this is here. n.b. Python is not a purely object-oriented language because of the way it handles encapsulation. If you want to read more about this, see this Quora answer or this analyticbridge website page.

19. What is a function decorator in Python?

A) A function that uses a function as a parameter and returns a function with a special @ syntax.
B) A function that uses a function as a parameter and returns a function with a special ^ syntax.
C) An anonymous function that passes along parameters.
D) A library module that enhances GUI Python programming.
E) A library module that obfuscates system functions programming.

Answer: A. See this posting for more information.

20. What is the recommended way of calling subprocesses in Python 3.5 or higher?

A) Using the os.exec function.
B) Using the os.spawn function.
C) Using the subprocess.open function.
D) Using the subprocess.run() function.

Answer: D. Source: https://docs.python.org/3/library/subprocess.html
Here is an example:

python
>>> from subprocess import PIPE
>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=PIPE, stderr=PIPE)

21. Without "import re" using Python 3, can you use the .find() method to match a pattern in a string?

A) Yes
B) No
C) It depends

Answer: A. Try this program as an example:

string1 = "abcdefghijklmnopqrstuvwxyz"
number_pattern_found = string1.find("jkl")
print(number_pattern_found)

22. Which of the following is true about Python class names?

A) It is mandatory that a class name's first letter be uppercase.
B) It is mandatory that a class name's first letter be lowercase.
C) It is recommended that a class name's first letter be uppercase.
D) It is recommended that a class name's first letter be lowercase.

Answer: C. Source: https://www.python.org/dev/peps/pep-0008/#class-names

23. What is a way to implement a hash table with Python?

A) a nested list
B) a list of tuples
C) a dictionary
D) a nested dictionary
E) all of the above
F) none of the above

Answer: E.
For A, the source is https://www.geeksforgeeks.org/implementation-of-hashing-with-chaining-in-python/
For B and C, the source is http://blog.chapagain.com.np/hash-table-implementation-in-python-data-structures-algorithms/
This helps explain B:
https://coderbook.com/@marcus/how-to-create-a-hash-table-from-scratch-in-python/
This explains D: https://www.edureka.co/blog/hash-tables-and-hashmaps-in-python/

24. To write your own class in Python, what is necessary? Choose all that apply.

A) The init() function needs to be present.
B) An encapsulated function with the syntax of two leading two underscores __
C) Have an indented block underneath the class definition.
D) Use the class keyword.

Answer: C and D.
The init() function is not strictly necessary. We tested it out. You may want to read this: https://www.w3schools.com/python/python_classes.asp

It is not necessary to use encapsulation. To learn how to use it (but it is optional), see this posting (internal for encapsulation). The creator of this quiz tested that you need more than the class keyword. With nothing underneath a class definition, you may see this error when you try to run the Python program: "IndentationError: expected an indented block". The indented block beneath the class definition can be any (or almost any) valid Python statement. You can run a program such as this (and it will not fail):

class Example:
  a = "nothing"

z = Example()

25. Items in a Python set {} have which of the following traits? Choose all that apply.

A) Unindexed
B) Ordered
C) Immutable
D) Potentially a duplicate of another item in a set

Answer: A and C. The items are not indexed nor changeable. Source for each one: https://www.w3schools.com/python/python_sets.asp

26. Where are variables stored? Choose all that apply.

A) In the heap if they are local variables
B) In the heap if they are global variables
C) In the stack if they are local variables
D) In the stack if they are global variables

Answer: B and C. Source: https://www.geeksforgeeks.org/how-are-variables-stored-in-python-stack-or-heap/

27. Of the following, when should there be space after an equals sign "="?

A) When there is an equivalence test
B) When there is an initialization of an unannotated function parameter
C) When there is a variable assignment involving a reserved word in Python
D) All of the above
E) None of the above

Answer: A. Source: https://www.python.org/dev/peps/pep-0008/#other-recommendations
B and C are instances when there should be no space after the equals sign.

28. There is no difference between an array and a list in Python. True or False?
Answer: False. Source: https://www.geeksforgeeks.org/difference-between-list-and-array-in-python/

29. What is a namedtuple? Choose the best answer.

A. A tuple with a variable name.
B. A tuple that is not indexed by integers but by attribute/key "strings."
C. A tuple that is indexed by integers and has values accessible via attribute/key "strings".
D. None of the above.

Answer: C. Source page 281 of Learning Python by Lutz. To use named tuples you must run a command interactively (or have a line of code) like this: from collections import namedtuple

30. For concurrency to work in multiuser programs, which is less expensive?

A. Multithreading
B. Multiprocessing
C. Using the GIL
D. None of the above.

Answer: A. Source page 509 of Expert Python Programming by Packt Publishing (the 3rd edition). The 4th edition is available here.

31. What does this code return?

def fun_func(n):
  return lambda a : a * n

tripler = fun_func(3)

print(tripler(15))

A. An error about tripler not accepting parameters.
B. An error about fun_func needing an additional parameter
C. 45
D. None of the above.

Answer: C.

32. What does the third line print?

>>> sample_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> new_list = sample_list[:]
>>> new_list

A. []
B. ['a', 'b', 'c', 'd',]
C. ['e', 'f', 'g', 'h']
D. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

Answer: D.

33. What does divmod(3, 6) return in Python?

A. 2
B. (0, 2)
C. (2, 0)
D. None of the above

Answer: D. It returns (0, 3)
Source: "The divmod() is part of python’s standard library which takes two numbers as parameters and gives the quotient and remainder of their division as a tuple." (This quote was taken from https://www.tutorialspoint.com/divmod-in-python-and-its-application#.)

34. What does the second line print?

sample_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
sample_list[:2]

A. ['a', 'b']
B. ['h', 'i']
C. ['d', 'e', 'f', 'g', 'h', 'i']
D. None of the above

Answer: A.

35. What does the second line print?

sample_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
sample_list[3:]

A. ['a', 'b']
B. ['a', 'b', 'c']
C. ['d', 'e', 'f', 'g', 'h', 'i']
D. None of the above

Answer: C.

36. What is an f-string in Python? Choose the best answer.

A. A format string specified by an "f" that uses variable substitution with {this_type} of syntax
B. A reserved word in Python for a formatted string
C. A special string variable that is mutable
D. Text enclosed in quotes near the definition of a method or class to allow a programmer to document what the method or class does
E. All of the above
F. None of the above.

Answer: A. The "F" can be capitalized or lower case. Source: https://realpython.com/python-f-strings/
The C choice is not that poor of an answer, but it is not the best answer.

37. The isspace method looks for which of the following?

A. One or more spaces
B. Tab characters
C. New lines
D. All of the above
E. None of the above

Answer: D. Source: Page 232 of Python 3 Object-Oriented Programming by Dusty Phillips.

38. Which of the following is a class-related method that Python supports?

A. Class
B. Instance
C. Static
D. All of the above
E. None of the above.

Answer: D. Source: Page 1029 of Learning Python by Lutz.

39. What is metaclass in Python?

A. A keyword referring to the __doc__ (aka doc strings) of a class
B. A keyword referring to the attributes of a class
C. Not a keyword, but a concept referring to the "dunder attribute" __attr__
D. None of the above

Answer: D. B could be acceptable. But we think D is the best answer. Source: https://stackoverflow.com/questions/100003/what-are-metaclasses-in-python

40. When are Python decorators executed?

A. Compile time
B. Run time
C. Both of the above
D. None of the above

Answer: B. Source is page 1272 of Learning Python by Lutz.

41. What does MRO stand for?

A. Maintenance Repair Operations
B. Method Resolution Order
C. Metadata Resolve Operate
D. None of the above

Answer: B. Source: https://www.pythonprogramming.in/what-does-mro-do.html

42. What does Enum do in Python?

A. It is a reserved word that creates a list of tuples from 1 until the last number.
B. It is a built-in class whose objects are numbered starting at 1.
C. It is a module that can be imported to facilitate creating objects of a class.
D. None of the above.

Answer: C. Source: https://www.tutorialspoint.com/enum-in-python

43. What is the #! pattern called in Python?
___________________________________

Answer: A. shebang
Source:https://www.quora.com/What-does-mean-in-the-Python-programming-language

44. What does doc do with a programmer-defined Python object? Choose the best answer.

A. It can display help information from the Python interpreter.
B. It prints the comments of a given object.
C. It prints the comments of a given object (but only those in the top-most comment designated by quotes).
D. None of the above

Answer: C. A. is almost true, but we do not think it is the best answer. Run this code if you want to learn more:

class GoldClass:
    def cool_func():
        """ top line in function """
        # this is a fun test
        """ middle line """
        return "some text"


var_1 = GoldClass.cool_func
print(var_1.__doc__)

45. Consider this code:

import random

neat_list = ["dog", "cat", "hamster"]
random.shuffle(neat_list)

What does random.shuffle(neat_list) do?

A. It does an in-place re-arrangement of the neat_list.
B. It returns a deep copy of a randomized version of neat_list.
C. Both of the above.
D. None of the above.

Answer: A.

46. To read a file you might use something like this:

open('foobar.txt', 'r')

Instead of the "r", what might you use to add text to a file?

A. a
B. w
C. Both of the above
D. None of the above

Answer: C. The "w" stands for write; the "a" stands for append. Write will erase previous content whereas append will not.

47. What does the "\t" signify in Python?

A. It is the syntax to place a given numeric string into a time data type.
B. It is a reference to a tab (e.g., for Python to recognize in a file).
C. It signifies a carriage return (e.g., for printing).
D. This two-character string has no special meaning.

Answer: B. Source: https://stackoverflow.com/questions/22116482/what-does-print-sep-t-mean

48. What does the caret symbol do in the context of a regex statement as follows?

import re
sample = "Very nice"
result = re.search("^Very", sample)

A. The caret symbol finds strings that do not have "Very"
B. The caret symbol finds strings that end with "Very"
C. The caret symbol finds strings that start with "Very"
D. None of the above

Answer: C.

49. What is the difference between a keyword and a built-in?

A. Keywords are part of modules that are imported, but built-ins work without any import statement.
B. Built-ins are part of modules that are imported, but keywords work without any import statement.
C. keywords are set by the user; they are not reserved words.
D. None of the above.

Answer: D. See https://stackoverflow.com/questions/8204542/python3-what-is-the-difference-between-keywords-and-builtins

50. How is index() different from find() in Python?

A. find() returns a Boolean (True or False) while index() returns an integer value
B. find() returns a string while index() returns an integer value
C. when the pattern is not found, they return different things
D. There is no difference.

Answer: C. Source: https://www.programiz.com/python-programming/methods/string/index

find() returns -1 if the pattern is not found and index() returns an error if the pattern is not found.

51. What is a quick way to return the key associated with the dictionary item that is the highest value?

A. max(name_of_dictionary, key=name_of_dictionary.get)
B. max(name_of_dictionary.items())
C. max(name_of_dictionary.values)
D. None of the above

Answer: A.

52. What is the value of int(True) ?

A. 0
B. 1
C. It would produce a command not found error.
D. None of the above.

Answer: B. Successful return codes in Linux are 0. In Python, int(True) is 1. In web applications, they are 2xx.

For more information, see this:
https://askubuntu.com/questions/892604/what-is-the-meaning-of-exit-0-exit-1-and-exit-2-in-a-bash-script
https://www.w3.org/Protocols/HTTP/HTRESP.html

53. How do you create an empty set in Python?

A. Use syntax like this: foobar = {}
B. Use syntax like this: foobar = set()
C. Use syntax like this: foobar = set.empty()
D. None of the above

Answer: B.

54. Complete the following sentence as an answer to the question, how does the Python "any" key word work?

"It accepts an iterable..."

A. and returns a random value from the iterable.
B. and a variable and returns "True" if the variable is in the iterable but it returns False if the variable is not in the iterable.
C. and returns "False" if every element is a zero or an empty string, where a space is considered not empty; otherwise it returns True.
D. and returns "False" if every element is a zero or an empty string, where a space is considered empty; otherwise it returns True.
E. and returns True if any of the variables in the iterable are True but it returns False if every variable is False.
F. and returns the first variable.

Answer: C. We tested it. For more information, see this external posting.

55. What does this code snippet print?

def cool_func(var1, var2):
    sum_vars = var1 + var2 + var3
    return sum_vars

var3 = 100
x = cool_func(2, 7)
print(x)

A. Nothing.
B. "9"
C. "NameError: name 'var3' is not defined"
D. "109"

Answer: D.

56. How is global different from nonlocal?

A. They are functionally equivalent.
B. You need to import sys for nonlocal statements to work.
C. nonlocal applies to functions inside of functions but does not affect variables' values outside of any function. global affects values outside of any function.
D. none of the above.

Answer: C. Source is here.

57. When you are not using Classes in Python, how much of the code relies on modules?

A. 0%
B. Usually about 50%
C. 100%
D. Not enough information to decide; it depends.

Answer: C. Source: Page 745 of Learning Python.

58. What happens when you try to add a duplicate entry to a set in Python?

A. With the .append() syntax or with the .add() syntax the duplicate entry will be added.

B. With the .append() syntax the program will stop processing and an error will show but with the .add() syntax nothing will happen (and the entry won't be added).

C. With the .append() syntax nothing will happen (and the entry won't be added) but with the .add() syntax the program will stop processing and an error will show.

D. Nothing will happen with either .append() syntax or with the .add() syntax, and the duplicate entry will be not added.

Answer: B.

59. How do you add an item to a list in the first position (instead of the last)?

A. appendleft()
B. insert()
C. prepend()
D. create a new temporary list with the item you want and combine it with the original list
E. none of the above
Answer: B. Source: https://www.geeksforgeeks.org/python-perform-append-at-beginning-of-list/

60. What is the difference between a method and a function?

A. There are no functions in Python.
B. There are no methods in Python.
C. Functions are custom-designed whereas methods are built-in.
D. Methods are custom-designed whereas functions are built-in.
E. Methods are associated with classes or objects whereas functions are not.
F. Functions are associated with classes or objects whereas methods are not.

Answer: E. Source is here.

61. What went wrong when you print the output of a variable, but you see something like this?

<built-in method strip of str object at 0x7fc87bf4a0f0>

A. You forgot to compile a program into a .pyc file.
B. You forgot to use parentheses "()".
C. You combined a built-in Python reserved word with your own function.
D. None of the above.

Answer: B. Source: A continualintegration.com posting.

62. What is a docstring in Python?

A. Text enclosed in quotes near the definition of a method or class to allow a programmer to document what the method or class does.
B. A reserved word in Python for a formatted string
C. A special string variable that is mutable
D. All of the above
E. None of the above.

Answer: A. Source: Page 44 of Python 3 Object-Oriented Programming by Dusty Phillips.

63. What is the difference between pass and continue?
_______________________________________

A. The continue keyword needs "import control" to work.
B. pass sends the interpreter downward in the program, continue sends the interpreter flow to the top of the subsuming for/while loop to process the next item.
C. There is no difference but the reserved words themselves; they are functionally equivalent.
D. None of the above.

Answer: B. The keyword pass is a no operation reserved word. The keyword continue involves going to the next iteration of the loop. continue would skip stanzas at equal indentation beneath the invocation of continue itself.
Source: https://stackoverflow.com/questions/33335740/python-pass-vs-continue

64. What type of data structure is a map?

A. a dictionary
B. a non-dictionary key-value store
C. an ordered dictionary
D. a list
E. a tuple
F. a set
G. None of the above

Answer: G. It is a function. Source: https://www.w3schools.com/python/ref_func_map.asp

65. Which of the following evaluate negative integers (e.g., a string of -1) to see if it is a number?

A. isdecimal
B. isnumeric
C. isdigit
D. all of the above
E. none of the above

Answer: E. Run the commands here and you will see "False" being returned:

>>> var_test = "-1"
>>> var_test.isnumeric()
False
>>> var_test.isdecimal()
False
>>> var_test.isdigit()
False
>>>

Big Data Quiz with Answers

Big Data Quiz

1.  What does EDH stand for?

a.  Enterprise Data Hub
b.  Extract Develop Hadoop
c.  Extract Decide Haul
d.  Extract Data Hadoop

Answer:  a.  Sources:
http://searchbusinessanalytics.techtarget.com/feature/Hadoop-2-YARN-set-to-shake-up-data-management-and-analytics (Previous link used to work.)
https://vision.cloudera.com/practical-uses-of-an-edh/

2.  Gartner, Informatica and MapR think "data lakes" should be referred to as what?

a.  data warehouses
b.  data dams
c.  data mills
d.  data reservoirs

Answer:  d.  Sources:
https://blogs.informatica.com/2015/02/11/data-streams-data-lakes-data-reservoirs-large-data-bodies/
https://mapr.com/solutions/enterprise/marketing-optimization/
https://infocus.emc.com/william_schmarzo/data-lake-data-reservoir-data-dumpblah-blah-blah/

3.  MapReduce is to Hadoop as ___________ is to Spark

a.  Storm
b.  Vertice Algorithm
c.  Directed Acyclic Graph
d.  RDD
e.  Memory

Answer: c.

The DAG is an integral process for Spark.  MapReduce is an integral process of Hadoop.  The quote "MapReduce™ is the heart of Apache™ Hadoop®." was found on IBM's site.  The quote "Each Spark job creates a DAG of task stages to be performed on the cluster."  was found on this HortonWorks site.

See these links for more information:
https://www.quora.com/What-are-the-Apache-Spark-concepts-around-its-DAG-Directed-Acyclic-Graph-execution-engine-and-its-overall-architecture
http://data-flair.training/blogs/dag-in-apache-spark/
http://data-flair.training/blogs/apache-spark-vs-hadoop-mapreduce/

4.  RDD stands for what in Spark?

a.  Really Different Data
b.  Resilient Distributed Dataset
c.  Real Developed Data
d.  Reliable Data Distribution

Answer:  b.  Source:  http://data-flair.training/blogs/apache-spark-rdd-tutorial/

5.  Which three file systems are recommended to be used with HDFS on top?

a.  cifs
b.  ext3
c.  ext4
d.  gfs
e.  hfs
f.  JFS
g.  nfs
h.  reiserfs
i.  vfat
j.  XFS

Answers:  b, c, j  For more information see these sources:
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/install_cdh_file_system.html
https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html

6.  If a Hadoop cluster had nodes that cost $15,000 each, would an HP Vertica or a Teradata solution cost more or less?  Choose two.

a.  HP Vertica would be cheaper
b.  HP Vertica would be more expensive
c.  Teradata would be cheaper
d.  Teradata would be more expensive

Answer:  a and c.  Source:  Page 27 of Managing Big Data Workflow for Dummies by Joe Goldberg and Lillian Pierson published by John Wiley & Sons, Inc in 2016.

7.  What is "a scalable and fault-tolerant stream processing engine built on the Spark SQL engine."?

a.  Structured streaming
b.  Beam
c.  Continual application
d.  Storm

Answer:  a.  For more information see this link https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html.

8.  What is a framework that allows you to implement streaming and batch data processing jobs that can run on any execution engine?

a.  Apache Apex
b.  Apache Beam
c.  Apache Cassandra
d.  Apache Flink
e.  Apache Storm

Answer: b.  See this link https://beam.apache.org/ for more information.

9.  Which of the following does not need Hadoop (choose two)?

a.  Apache Apex
b.  Apache Flink
c.  Apache Spark
d.  Apache Tez

Answers: b. and c.

Why not Apex? "Apex is designed to run in your existing Hadoop ecosystem, using YARN to scale up or down as required and leveraging HDFS for fault tolerance."  https://www.infoworld.com/article/3059284/application-development/look-out-spark-and-storm-here-comes-apache-apex.html

To understand why b. is one correct answer, read this:  "Flink is independent of Apache Hadoop and runs without any Hadoop dependencies."  taken from this external site (https://flink.apache.org/faq.html#how-does-flink-relate-to-the-hadoop-stack) that is no longer up. This link corroborates this answer: https://issues.apache.org/jira/browse/FLINK-4315

To understand why c. is one correct answer, read the following:

Do I need Hadoop to run Spark?
No, but if you run on a cluster, you will need some form of shared file system (for example, NFS mounted at the same path on each node). If you have this type of filesystem, you can just deploy Spark in standalone mode.

This was taken from Apache's website.

Why not Apache Tez?  It requires Hadoop YARN according to this site https://tez.apache.org/.

10.  Which of the following is a "Hadoop YARN native platform" (thus dependent on Hadoop) and a type of "unified stream and batch processing engine"?

a.  Apache Apex
b.  Apache Beam
c.  Apache Cassandra
d.  Apache Delta
e.  Apache Flink

Answer: a.  See this link http://apex.apache.org/ for more information.

11.  What company provides a commercial version of Apache Spark that was founded by the people who invented Apache Spark?

a.  Data Pipeline Gurus, LLC
b.  Databricks
c.  Hotfire Software
d.  Zephyr Data

Answer: b.  The founders of Apache Spark started Databricks (https://www.washingtonpost.com/news/the-switch/wp/2016/06/09/this-is-where-the-real-action-in-artificial-intelligence-takes-place/?utm_term=.ac9e0cea115f).

12.  What is Microsoft's version of Hadoop?

a.  MS Knowledge
b.  BigTable
c.  HDInsight
d.  Datica
e.  Kinesis

Answer: c.  Sources:  http://www.itprotoday.com/microsoft-sql-server/use-ssis-etl-hadoop
 and https://azure.microsoft.com/en-us/services/hdinsight

13. What are examples of a Directed Acyclic Graph?

a. A typical ETL process
b. The npm package manager
c. YARN
d. Spark operating on RDDs via stages which involves sub-tasks
e. Apache Airflow's pythonic schedule of phases for dynamic processing
f. All of the above
g. None of the above

Answer: f. All of the above.

a. because of 1) http://www.cs.uoi.gr/~pvassil/publications/2009_DB_encyclopedia/Extract-Transform-Load.pdf and 2) https://www.d-one.ai/documents/Topological-sorting-and-the-ETL-process-Joonas-Asikainen-D1-Solutions-Zuerich.pdf
b. because of https://medium.com/basecs/spinning-around-in-cycles-with-directed-acyclic-graphs-a233496d4688
c. because of https://medium.com/basecs/spinning-around-in-cycles-with-directed-acyclic-graphs-a233496d4688
d. because of https://data-flair.training/blogs/dag-in-apache-spark/
e. because of https://bigdata-etl.com/apache-airflow-create-dynamic-dag/

14. At what stage in the MapReduce process does the "shuffle" phase happen?

a. Before the map stage
b. After the map stage and before the reduce stage
c. After the reduce stage
d. None of the above

Answer: B. Source page 643 of Cracking the Coding Interview

15. How does Hadoop support high availability for your name node?

a. Via the secondary namenode
b. A standby namenode only in proprietary Hadoop versions
c. A standby namenode in open source or proprietary Hadoop versions
d. N/A. There is no native Hadoop support for highly available namenodes

Answer: C. See https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html
Why A is wrong, see http://hadooptutorial.info/secondary-namenode-in-hadoop/

How Do You Take a Partial Back Up of Your Weebly Website?

Problem scenario
You have a Weebly website. You want a partial back-end back up of your website. What should you do?

Solution
1. Go to your Weebly Editor.
2. At the top, go to "Settings"
3. Scroll down to "Archive"
4. Enter your email address and click "Email Archive"
5. The email with have a link to download a portion of the website.

Warning: This will only get some of the current pages. The "Previous" pages (older pages that appear on other pages) will not be downloaded.

A List of Books on Message Queuing Solutions: ActiveMQ, AMQP, WebSphere and ZeroMQ

This is a list of books focusing on message queuing solutions other than RabbitMQ.  For RabbitMQ, see this link.

AMQP PREDICTIVE ANALYTICS REPORT by Gerard Blokdijk
ActiveMQ in Action by Bruce Snyder, Dejan Bosanac and Rob Davies
Code Connected Volume 1: Learning ZeroMQ by Pieter Hintjens
Programming WebSphere MQ with JAVA by Kunal Jaggi
ZeroMQ: Messaging for Many Applications  by Faruk Akgul
ZeroMQ: Messaging for Many Applications by Pieter Hintjens