What is MapReduce in Hadoop?

Problem scenario
You want to learn more about MapReduce.  You want to learn what it does so you can grasp Hadoop more thoroughly.  What is MapReduce?

MapReduce is a core process of Hadoop.  With multi-node deployments of Hadoop, MapReduce distributes data to different datanodes (servers that are controlled by a master server).  This data is retrievable thanks to MapReduce.  Conceptually there are two main components to MapReduce: mapper and reducer.  The mapper function creates temporary key-value pairs.   The reducer is composed of three phases: shuffle, sort and reduce.  Shuffle and sort happen simultaneously.  Together these components leverage commodity hardware in multi-node deployments.  If you want to learn more you can see this link.

After you run a MapReduce job, you'll see output that is a summary of the job.  The summary will include various quantifiable aspects of the job in the "Map-Reduce Framework" section of the  text output.

It is not hard to set up Hadoop and run a MapReduce job.  You can learn more about the process if you do this.  To deploy Hadoop on a RedHat derivative, see this link; to deploy Hadoop on Ubuntu, see this link.  You can follow this link to run a MapReduce job for  learning

Leave a comment

Your email address will not be published. Required fields are marked *