Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Limit number of records or total size in combiner input using jobconf?

Copy link to this message
Limit number of records or total size in combiner input using jobconf?
Saptarshi Guha 2009-02-13, 14:41
Running  a MR job on 7 machines failed when it came to processing
53GB. Browsing the errors,
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:1106)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:979)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:876)
The reason why my line failed is that there were too many records. I
offload calculations to a another program and it screamed out of
Seeing the source in sortAndSpill where this happened:(hadoop -0.19)
              int spstart = spindex;
              while (spindex < endPosition &&
                  kvindices[kvoffsets[spindex % kvoffsets.length]
                            + PARTITION] == i) {
              // Note: we would like to avoid the combiner if we've fewer
              // than some threshold of records for a partition
              if (spstart != spindex) {
                RawKeyValueIterator kvIter = new
MRResultIterator(spstart, spindex);
                combineAndSpill(kvIter, combineInputCounter);
So here are my questions:
(1) is there a  jobconf hint to limit the number of records in kviter?
I can (and have) made a fix to my code that processes the values in a
combiner step in batches (i.e takes N at a go,processes that and
repeat), but was wondering if i could just set an option.

Since this occurred in the MapContext, changing the number of reducers
wont help.
(2) How does changing the number of reducers help at all? I have 7
machines, so I feel 11 (a prime close to 7, why a prime?) is good
enough (some machines are 16GB others 32GB)

Saptarshi Guha - [EMAIL PROTECTED]