Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Sorting in Mapper to Reducer


Copy link to this message
-
Re: Sorting in Mapper to Reducer
I believe some good web resources are:

   - http://www.slideshare.net/cloudera/mr-perf
   -
   http://gbif.blogspot.de/2011/01/setting-up-hadoop-cluster-part-1-manual.html(look
at "The Map Side" section
   - This chapter from the T. White's Hadoop book:
   https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
   - Explanation abou the Map Task:
   http://codrspace.com/b441berith/hadoop-maptask-inside/
Basically, the keys emitted from the map function are accumulated in a
in-memory buffer (MapOutputBuffer class). When the buffer gets full, the
keys are sorted first by partition and, within the partitions, by key and
then write in a temporary file called spill. The in-memory sorting
algorithm used is quicksort. When the map task has finished processing its
input split, possibly there will be many spills, which must be merged into
one single file in order to be available for the reduce tasks.

Best,

Dusso
2014-05-24 16:10 GMT+02:00 Knowledge gatherer <
[EMAIL PROTECTED]>:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB