August 3, 2017

Hadoop MapReduce

Hadoop MapReduce Introduction

MapReduce is the processing layer of Hadoop. MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.  You just need to put business logic in the way MapReduce works and rest things will be taken care by the framework. Work (complete job) which is submitted by the user to master is divided into small works (tasks) and assigned to slaves.

MapReduce programs are written in a particular style influenced by functional programming constructs, specifical idioms for processing lists of data. Here in map reduce we get input as a list and it converts it into output which is again a list. It is the heart of Hadoop. Hadoop is so much powerful and efficient due to map reduce as here parallel processing is done.

 MapReduce – High-level Understanding

Map-Reduce divides the work into small parts, each of which can be done in parallel on the cluster of servers. A problem is divided into a large number of smaller problems each of which is processed independently to give individual outputs. These individual outputs are further processed to give final output.

Hadoop Map-Reduce is highly scalable and can be used across many computers. Many small machines can be used to process jobs that normally could not be processed by a large machine.

Read More At Data Flair

Click Here!