Big Data Quiz

Big Data Quiz

1.  What does EDH stand for?

a.  Enterprise Data Hub
b.  Extract Develop Hadoop
c.  Extract Decide Haul
d.  Extract Data Hadoop

2.  Gartner, Informatica and MapR think "data lakes" should be referred to as what?

a.  data warehouses
b.  data dams
c.  data mills
d.  data reservoirs

3.  MapReduce is to Hadoop as ___________ is to Spark

a.  Storm
b.  Vertice Algorithm
c.  Directed Acyclic Graph
d.  RDD
e.  Memory

4.  RDD stands for what in Spark?

a.  Really Different Data
b.  Resilient Distributed Dataset
c.  Real Developed Data
d.  Reliable Data Distribution

5.  Which three file systems are recommended to be used with HDFS on top?

a.  cifs
b.  ext3
c.  ext4
d.  gfs
e.  hfs
f.  JFS
g.  nfs
h.  reiserfs
i.  vfat
j.  XFS

6.  If a Hadoop cluster had nodes that cost $15,000 each, would an HP Vertica or a Teradata solution cost more or less?  Choose two.

a.  HP Vertica would be cheaper
b.  HP Vertica would be more expensive
c.  Teradata would be cheaper
d.  Teradata would be more expensive

7.  What is "a scalable and fault-tolerant stream processing engine built on the Spark SQL engine."?

a.  Structured streaming
b.  Beam
c.  Continual application
d.  Storm

8.  What is a framework that allows you to implement streaming and batch data processing jobs that can run on any execution engine?

a.  Apache Apex
b.  Apache Beam
c.  Apache Cassandra
d.  Apache Flink
e.  Apache Storm

9.  Which of the following does not need Hadoop (choose two)?

a.  Apache Apex
b.  Apache Flink
c.  Apache Spark
d.  Apache Tez

10.  Which of the following is a "Hadoop YARN native platform" (thus dependent on Hadoop) and a type of "unified stream and batch processing engine"?

a.  Apache Apex
b.  Apache Beam
c.  Apache Cassandra
d.  Apache Delta
e.  Apache Flink

11.  What company provides a commercial version of Apache Spark that was founded by the people who invented Apache Spark?

a.  Data Pipeline Gurus, LLC
b.  Databricks
c.  Hotfire Software
d.  Zephyr Data

12.  What is Microsoft's version of Hadoop?

a.  MS Knowledge
b.  BigTable
c.  HDInsight
d.  Datica
e.  Kinesis

13. What are examples of a Directed Acyclic Graph?

a. A typical ETL process
b. The npm package manager
d. Spark operating on RDDs via stages which involves sub-tasks
e. Apache Airflow's pythonic schedule of phases for dynamic processing
f. All of the above
g. None of the above

For answers, see this posting.

Leave a comment

Your email address will not be published. Required fields are marked *