Big Data Quiz
1. What does EDH stand for?
a. Enterprise Data Hub
b. Extract Develop Hadoop
c. Extract Decide Haul
d. Extract Data Hadoop
2. Gartner, Informatica and MapR think "data lakes" should be referred to as what?
a. data warehouses
b. data dams
c. data mills
d. data reservoirs
3. MapReduce is to Hadoop as ___________ is to Spark
a. Storm
b. Vertice Algorithm
c. Directed Acyclic Graph
d. RDD
e. Memory
4. RDD stands for what in Spark?
a. Really Different Data
b. Resilient Distributed Dataset
c. Real Developed Data
d. Reliable Data Distribution
5. Which three file systems are recommended to be used with HDFS on top?
a. cifs
b. ext3
c. ext4
d. gfs
e. hfs
f. JFS
g. nfs
h. reiserfs
i. vfat
j. XFS
6. If a Hadoop cluster had nodes that cost $15,000 each, would an HP Vertica or a Teradata solution cost more or less? Choose two.
a. HP Vertica would be cheaper
b. HP Vertica would be more expensive
c. Teradata would be cheaper
d. Teradata would be more expensive
7. What is "a scalable and fault-tolerant stream processing engine built on the Spark SQL engine."?
a. Structured streaming
b. Beam
c. Continual application
d. Storm
8. What is a framework that allows you to implement streaming and batch data processing jobs that can run on any execution engine?
a. Apache Apex
b. Apache Beam
c. Apache Cassandra
d. Apache Flink
e. Apache Storm
9. Which of the following does not need Hadoop (choose two)?
a. Apache Apex
b. Apache Flink
c. Apache Spark
d. Apache Tez
10. Which of the following is a "Hadoop YARN native platform" (thus dependent on Hadoop) and a type of "unified stream and batch processing engine"?
a. Apache Apex
b. Apache Beam
c. Apache Cassandra
d. Apache Delta
e. Apache Flink
11. What company provides a commercial version of Apache Spark that was founded by the people who invented Apache Spark?
a. Data Pipeline Gurus, LLC
b. Databricks
c. Hotfire Software
d. Zephyr Data
12. What is Microsoft's version of Hadoop?
a. MS Knowledge
b. BigTable
c. HDInsight
d. Datica
e. Kinesis
13. What are examples of a Directed Acyclic Graph?
a. A typical ETL process
b. The npm package manager
c. YARN
d. Spark operating on RDDs via stages which involves sub-tasks
e. Apache Airflow's pythonic schedule of phases for dynamic processing
f. All of the above
g. None of the above
14. At what stage in the MapReduce process does the "shuffle" phase happen?
a. Before the map stage
b. After the map stage and before the reduce stage
c. After the reduce stage
d. None of the above
15. How does Hadoop support high availability for your name node?
a. Via the secondary namenode
b. A standby namenode only in proprietary Hadoop versions
c. A standby namenode in open source or proprietary Hadoop versions
d. N/A. There is no native Hadoop support for highly available namenodes
For answers, see this posting.