This is reasonable if you have any kind of trends in the ordering of your data or any computation in the mappers.
You can use a smaller input split to Reduce the load on each individual mapper so that large blocks of records that take a long time To Process are less likely to clog one mapper.
Jay Vyas MMSB UCHC
On Oct 2, 2012, at 9:04 PM, Huanchen Zhang <[EMAIL PROTECTED]> wrote:
> Hello, > > I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ? > > Thank you. > > Best, > Huanchen
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext