Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How / when does On-disk merge work?


+
- 2013-10-25, 19:35
Copy link to this message
-
Re: How / when does On-disk merge work?
Hi!

Tom White's "Hadoop: The Definitive Guide" is probably the best source for information on this (apart from the code itself ;-) Look at MergeManagerImpl.java btw in case you are so inclined).

HTH
Ravi  

On Friday, October 25, 2013 2:36 PM, - <[EMAIL PROTECTED]> wrote:
 
Hi All,

Can anyone provide documentation regarding how on-disk merge on reduce phase works in detail in Hadoop 2.2.0?
There is an explanation in this page but I am afraid it could be outdated since what I observe in my log files is a bunch of "OnDiskMerger - Thread to merge on-disk map-outputs" work at the end of merge phase.

Thanks,
-
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB