Extract-Transform-Load workflows involve considerable architecture including a workflow over a network to take data from a flat file and ingest it into a database. Automation is one way to manage the ETL support system. DevOps Engineers commonly support database installations and configurations. DevOps engineers commonly support continual delivery pipelines. This automated process (involving automatic deployments) is often similar to automating an ETL process. DevOps engineering, build and release engineering, automation development, and ETL design are all interdisciplinary fields of information technology. This is a quiz related to both DevOps and ETL topics.
1. What is the DevOps tool for databases?
2. What does mung mean?
3. What does idempotent mean?
4. What is the name of the process of actively preparing data for serialization (e.g., data that was not otherwise logically contiguous on disk for a buffer) called? This process may include modifying data from one programming language or interface so it is compatible with a different programming language or different interface.
a. Almquist variation
c. scrum transition
5. How is an imperative process different from a declarative process?
6. What is a common tool that both ETL Developers and DevOps Engineers use?
7. Which of the following can you not create an AWS Data Pipeline with?
a. AWS Management Console
b. AWS Command Line Interface
c. AWS SDKs
d. AWS APIs
e. None of the above
8. Mesos Clusters cannot work with both HDFS and Digital Ocean?
9. Hadoop YARN cannot act as a scheduler for OpenShift?
10. Which of the following Apache products can create ETL jobs?
11. Which of the following is not an ETL product?
a. IBM InfoSphere Datastage
b. Oracle Warehouse Builder
c. Business Objects XI
d. SAS Enterprise ETL server
g. Apache Hadoop
h Talend Big Data Integration
12. In Informatica are mapplets only able to be used once without logic?
13. Which of the tools below are tools designed to aide ETL process testing and validating data warehouses themselves?
a. QuerySurge by Real-Time Technology Solutions
c. Apache Cassandra
d. Apache Stratos
e. ETL Validator by datagaps inc.
14. What is an example of cooked data in the context of ETL/Devops?
a. Machine-corrupted data (e.g., from disk failure)
b. Content that was corrupted maliciously
c. Cleansed data
d. Intentionally masked data (to hide identities)
15. What is the technique that divides a table of a database into different subcomponents, such as partitioning columns, to improve read and write performance?
a. data marting
b. impedance matching
16. What tool allows you to designate when Docker containers process ETL jobs without manual configuration?
17. Which of the following can readily be used as a superior ETL platform?
d. Note Beak
18. There is consensus that small companies should use Informatica or a supported, proprietary ETL tool as opposed to an in-house developed tool.
19. Which of the following has an open source version:
a. Talend Integration Suite
b. Pentaho Kettle Enterprise
d. All of the above
e. None of the above
20. What is a data lake?
a. A synonym of data warehouse
b. A buffer of streamed data
c. An archive of metadata about previous real-time data streams
d. A pool of unstructured data
21. What is a data swamp?
a. A dense data lake
b. A severely degraded data lake
c. A synonym of a data warehouse
d. A pool of unstructured data
e. An archive of metadata about previous real-time data streams
22. Snappy is the name of which two concepts?
a. The REST API for SnapChat
b. A data compression and decompression library with bindings for several languages
c. A Linux package management system
d. An automation scheduler for Informatica
e. An open source component to migrate SSIS packages to PostgreSQL
23. In a SQL database you have a left table with four rows and a right table with seven rows, what is the highest number of rows that can be returned with an inner join?
d. More than 11
24. Which of the following provide Sqoop based connectors (choose all that apply)?
c. Talend Open Studio
c. Informatica (modern versions)
25. What is a continuous application?
a. The namesake of CA traded on the Nasdaq as CA
b. An application that encompasses data streaming (e.g., ETL processes) from start to finish that adapts itself to the data stream(s) in real-time
c. An application that leverages ETL processing
d. An application receiving continuous integration (or continual integration)
e. An application receiving continous delivery (or continual delivery)
f. An application receiving continuous deployments (or continual deployments)
g. An application that is always available through fault tolerance and load balancing
26. DevOps expert Gene Kim got his start with a security product called Tripwire, known for its emphasis on changes to files. There is a tool that keeps track of changes to a database. Which product below concerns itself with tracking changes of database schemas?
27. Which product enables you to quickly make copies of SQL Server databases for your Test, QA or development environments? Choose the most accurate answer.
a. Canonical's Juju
b. RedGate's SQL Provision
c. Apache Hamster
d. Apache Numa
28. The SQL Server database back ups are not working or you get false positives that your back up solution is successfully backing them up. What solution should you for a practical back up solution?
a. Write you own PowerShell script that backs up the database
b. Implement AlwaysOn Availability Groups
c. Implement RedGate's Toolbelt
d. Implement Apache Impala