Monday 6 January 2014

Shuffle and Sort


MapReduce makes the guarantee that the input to every reducer is sorted by key.The process by which the system performes the sort and transfers the map outputs to the reducers as inputs is known as shuffle.In many ways, the shuffle is the heart of MapReduce.


The Map Side


When map function starts producing output,it is not simply written to the disk but it includes buffering writes and some presorting.Each map writes output to a circular memory buffer (default size 100 MB) assigned to it. When the contents of the buffer reaches a certain threshold size , a background thread will start to spill the contents to disk. Map outputs will continue to be written to the buffer while the spill takes place, but if the buffer fills up during this time, the map will block until the spill is complete.Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to.

Each time the memory buffer reaches the spill threshold, a new spill file is created, so after the map task has written its last output record, there could be several spill files. Before the task is finished, the spill files are merged into a single partitioned and sorted output file.

It is often a good idea to compress the map output as it is written to disk because doing so makes it faster to write to disk, saves disk space, and reduces the amount of data to transfer to the reducer. By default, the output is not compressed, but it is easy to enable this by setting mapred.compress.map.output to true.

The Reduce Side


The reduce task needs the map output for its particular partition from several map tasks across the cluster. The map tasks may finish at different times, so the reduce task starts copying their outputs as soon as each completes. This is known as the copy phase of the reduce task. The reduce task has a small number of copier threads so that it can fetch map outputs in parallel. The default is five threads, but this number can be changed by setting the mapred.reduce.parallel.copies property.

The map outputs are copied to the reduce task JVM’s memory if they are small enough (the buffer’s size is controlled by mapred.job.shuffle.input.buffer.percent, which specifies the proportion of the heap to use for this purpose); otherwise, they are copied to disk. When the in-memory buffer reaches a threshold size (controlled by mapred.job.shuffle.merge.percent) or reaches a threshold number of map outputs (mapred.inmem.merge.threshold), it is merged and spilled to disk. If a combiner is specified, it will be run during the merge to reduce the amount of data written to disk. As the copies accumulate on disk, a background thread merges them into larger, sorted files. This saves some time merging later on. Note that any map outputs that were compressed (by the map task) have to be decompressed in memory in order to perform a merge on them.

When all the map outputs have been copied, the reduce task moves into the sort phase (which should properly be called the merge phase, as the sorting was carried out on the map side), which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 50 map outputs and the merge factor was 10 (the default, controlled by the io.sort.factor property, just like in the map’s merge), there would be five rounds. Each round would merge 10 files into one, so at the end there would be five intermediate files. Rather than have a final round that merges these five files into a single sorted file, the merge saves a trip to disk by directly feeding the reduce function in what is the last phase: the reduce phase. This final merge can come from a mixture of in-memory and on-disk segments.

During the reduce phase, the reduce function is invoked for each key in the sorted output. The output of this phase is written directly to the output filesystem, typically HDFS. In the case of HDFS, because the tasktracker node (or node manager) is also running a datanode, the first block replica will be written to the local disk.

14 comments:

  1. this is fuss fuss

    ReplyDelete
  2. blog to achcha hai lalla lekin mast nahi hai ...kindly make it so mast

    ReplyDelete
  3. no code no pouce vert

    ReplyDelete
  4. This blog is really good to understand Hadoop and Bigdata development concepts.

    ReplyDelete
  5. You have copied it from other be courteous enough to mention that.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. why not tell the people to open the page # 206 of Hadoop The Definitive guide from where you have copied it verbatim. :)

    ReplyDelete
  8. This is copy/paste from Hadoop Definitive Guide. Dude, why do you do such things.

    ReplyDelete
  9. This is copy/paste from Hadoop Definitive Guide. why do you do such things.

    ReplyDelete
  10. copy paste from Hadoop Definitive Guide

    ReplyDelete
  11. Hi,Your post on hadoop shuffle and sort was the best post and i was clear with the concepts thanks for posting Hadoop Training in Velachery | Hadoop Training .

    ReplyDelete
  12. Herpes Virus whether it is oral or genital. To control its symptoms, you usually do many things but it doesn’t give you the expected results. And sometimes some medicines can even give you side effects which can make your situation more critical. Personally I always prefer natural cure for herpes Or any Other Infection because they won’t give you side effects. You can cure your infection/Diseases smoothly and with less trouble with natural remedies. I Strongly Recommend Herbal doctor Razor's Traditional Medicine , Get in touch with him on his Facebook Page https://web.facebook.com/HerbalistrazorMedicinalcure He is blessed with the wisdom to get rid of this virus and other Diseases. I had suffered from this Virus since I was a child, I'd learnt to live with it but still wanted to get cured of it and DOC RAZOR simply helped me with that . All thanks To Doctor Razor Who Rescued Me. Contact him on email : drrazorherbalhome@gmail.com, . Reach Him directly on https://wa.me/message/USI4SETUUEW4H1

    ReplyDelete