Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Limit number of records or total size in combiner input using jobconf?


Copy link to this message
-
Limit number of records or total size in combiner input using jobconf?
Hello,
Running  a MR job on 7 machines failed when it came to processing
53GB. Browsing the errors,
org.saptarshiguha.rhipe.GRMapreduce$GRCombiner.reduce(GRMapreduce.java:149)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:1106)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:979)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:876)
The reason why my line failed is that there were too many records. I
offload calculations to a another program and it screamed out of
memory.
Seeing the source in sortAndSpill where this happened:(hadoop -0.19)
              int spstart = spindex;
              while (spindex < endPosition &&
                  kvindices[kvoffsets[spindex % kvoffsets.length]
                            + PARTITION] == i) {
                ++spindex;
              }
              // Note: we would like to avoid the combiner if we've fewer
              // than some threshold of records for a partition
              if (spstart != spindex) {
                combineCollector.setWriter(writer);
                RawKeyValueIterator kvIter = new
MRResultIterator(spstart, spindex);
                combineAndSpill(kvIter, combineInputCounter);
              }
So here are my questions:
(1) is there a  jobconf hint to limit the number of records in kviter?
I can (and have) made a fix to my code that processes the values in a
combiner step in batches (i.e takes N at a go,processes that and
repeat), but was wondering if i could just set an option.

Since this occurred in the MapContext, changing the number of reducers
wont help.
(2) How does changing the number of reducers help at all? I have 7
machines, so I feel 11 (a prime close to 7, why a prime?) is good
enough (some machines are 16GB others 32GB)

Regards
Saptarshi
--
Saptarshi Guha - [EMAIL PROTECTED]