-Re: Limit number of records or total size in combiner input using jobconf?
Saptarshi Guha 2009-02-24, 02:36
On Fri, Feb 20, 2009 at 5:34 PM, Chris Douglas <[EMAIL PROTECTED]> wrote:
>> So here are my questions:
>> (1) is there a jobconf hint to limit the number of records in kviter?
>> I can (and have) made a fix to my code that processes the values in a
>> combiner step in batches (i.e takes N at a go,processes that and
>> repeat), but was wondering if i could just set an option.
> Approximately and indirectly, yes. You can limit the amount of memory
> allocated to storing serialized records in memory (io.sort.mb) and the
> percentage of that space reserved for storing record metadata
> (io.sort.record.percent, IIRC). That can be used to limit the number of
> records in each spill, though you may also need to disable the combiner
> during the merge, where you may run into the same problem.
> You're almost certainly better off designing your combiner to scale well (as
> you have), since you'll hit this in the reduce, too.
>> Since this occurred in the MapContext, changing the number of reducers
>> wont help.
>> (2) How does changing the number of reducers help at all? I have 7
>> machines, so I feel 11 (a prime close to 7, why a prime?) is good
>> enough (some machines are 16GB others 32GB)
> Your combiner will look at all the records for a partition and only those
> records in a partition. If your partitioner distributes your records evenly
> in a particular spill, then increasing the total number of partitions will
> decrease the number of records your combiner considers in each call. For
> most partitioners, whether the number of reducers is prime should be
> irrelevant. -C