I believe some good web resources are:

   - http://www.slideshare.net/cloudera/mr-perf
at "The Map Side" section
   - This chapter from the T. White's Hadoop book:
   - Explanation abou the Map Task:
Basically, the keys emitted from the map function are accumulated in a
in-memory buffer (MapOutputBuffer class). When the buffer gets full, the
keys are sorted first by partition and, within the partitions, by key and
then write in a temporary file called spill. The in-memory sorting
algorithm used is quicksort. When the map task has finished processing its
input split, possibly there will be many spills, which must be merged into
one single file in order to be available for the reduce tasks.


2014-05-24 16:10 GMT+02:00 Knowledge gatherer <
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB