The scale of each reducer depends on the Partitioner. You can think of
Partitioner as a Hash Function, and the reducer as bucket, So you can
not expect that each bucket has same number of items.
Skewed data distribution will make a few reducers cost much more time.
2010/6/18 李钰 <[EMAIL PROTECTED]>:
> Hi Jeff and Amogh,
> Thanks for your comments! In my understanding, in the partitioning phase
> before spilling to disk, the threads will divide the data into partitions
> corresponding to the number of reducers, as described int the Definitive
> Guide. So I think the scale of input data should be the same for each
> reducer. I wonder if I have any misunderstanding about this, please correct
> me if you find any faults, thanks.
> As to the reduce phases, I did check the time of shuffle, sort and reduce
> through the JT UI, but found it quite different for each reduce task. Some
> task may have longer shuffle time but less reduce time, while some may have
> less shuffle time but longer reduce time. I set the reducer number large
> enough to let all reduce tasks run in parallel, and set
> "mapred.reduce.slowstart.completed.maps" parameter to 1.0 to let them start
> at the same time when all map tasks have been finished, and I think this may
> reduce the impact of network and time cost of waiting for map task to finish
> during the shuffle phase. Then why still got quite different time spent in
> shuffle? And since the reduce phase of reduce is just writing sorted data
> into HDFS, why the time of reduce phase is different?
> Anything wrong with my analyzing? Any suggestions? Thanks a lot.
> Dear all,
> Any other comments? Thanks.
> Best Regards,
> 在 2010年6月18日 上午11:39���Amogh Vasekar <[EMAIL PROTECTED]>写道：
>> >>Since the scale of input data and operations of each reduce task is the
>> same, what may cause the execution time of reduce tasks different?
>> You should consider looking at the copy, shuffle and reduce times
>> separately from JT UI to get better info. Many (dynamic) considerations like
>> network congestion, number of mappers reducer is fetching from, data skew
>> wrt input keys to reducer etc will affect this number.
>> On 6/18/10 8:05 AM, "李钰" <[EMAIL PROTECTED]> wrote:
>> Hi Todd and Jeff,
>> Thanks a lot for your discussion, it's really helpful to me. I'd like to
>> express my especial appreciation for Todd's patient explanation, you help
>> see more clearly about the working mechanism of SORT. And Jeff, really
>> you for reminding me that sort uses TotalOrderPartitioner to do
>> Based on your discussion I update my understanding as follows:
>> The sorting happens on the map side during the spill process of each map
>> task, after that, the overall map outputs are partitioned by method of
>> TotalOrderPartitioner, this decides the input range of each reducer.
>> Reducers get map outputs as decided by the partitioner, and do merging and
>> write results into HDFS.
>> Is this understanding right? Please correct me if you find any faults,
>> If this understanding is right, then my question rolls back to the original
>> one: Since the scale of input data and operations of each reduce task is
>> same, what may cause the execution time of reduce tasks different? All
>> used in my experiment are on the same rack, and they are homogenous.
>> Any suggesion will be highly appreciated, thanks.
>> Best Regards,
>> 2010/6/18 Todd Lipcon <[EMAIL PROTECTED]>
>> > On Thu, Jun 17, 2010 at 9:37 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>> > > Todd,
>> > >
>> > > Why's there a sorting in map task, the sorting here seems useless in my
>> > > opinion.
>> > >
>> > >
>> > For map-only jobs there isn't. For jobs with reduce, typically the number
>> > of
>> > reduce tasks is smaller than the number of map tasks, so parallelizing
>> > sort on the mappers and just doing merge on the reducers is beneficial.