Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How does map-merge work exactly?


+
Martin Dobmeier 2012-09-13, 14:04
Copy link to this message
-
Re: How does map-merge work exactly?
On Thu, Sep 13, 2012 at 7:04 AM, Martin Dobmeier
<[EMAIL PROTECTED]> wrote:
> What exactly is a segment? Is it the number of spills?

A segment in this context is a fraction of spill output for a
particular reduce. Each spill contains a segment for every reduce.

> What does "0 segments left" mean? Does it mean that the merge could be
> performed on the first pass?
> Why are only 54 segments merged instead of "io.sort.factor" segments?

The intermediate merge of 54 files to 1 reduces the number of files to
117 - 53 = 64 segments. The final merge is over 64 segments.

> (io.sort.factor determines the number of files to merge during a pass,
> right?)
> Why is the merge performed "number of reducers" times? (I'm counting the
> phrase "Merging 117 segments" exactly 96 times)

Each invocation of the merger is combining all the output assigned to
a reduce by the partitioner. -C
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB