Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Performance tuning of sort


Copy link to this message
-
Re: Performance tuning of sort
Todd,

Why's there a sorting in map task, the sorting here seems useless in my opinion.

On Thu, Jun 17, 2010 at 9:26 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On Thu, Jun 17, 2010 at 12:43 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>
>> Your understanding of Sort is not right. The key concept of Sort is
>> the TotalOrderPartitioner. Actually before the map-reduce job, client
>> side will do sampling of input data to estimate the distribution of
>> input data. And the mapper do nothing, each reducer will fetch its
>> data according the TotalOrderPartitioner. The data in each reducer is
>> local sorted, and each reducer are sorted ( r0<r1<r2....), so the
>> overall result data is sorted.
>>
>
> The sorting happens on the map side, actually, during the spill process. The
> mapper itself is an identity function, but the map task code does perform a
> sort (on a <partition,key> tuple) as originally described in this thread.
> Reducers just do a merge of mapper outputs.
>
> -Todd
>
>
>>
>>
>>
>> On Thu, Jun 17, 2010 at 12:13 AM, 李钰 <[EMAIL PROTECTED]> wrote:
>> > Hi all,
>> >
>> > I'm doing some tuning of the sort benchmark of hadoop. To be more
>> specified,
>> > running test against the org.apache.hadoop.examples.Sort class. As
>> looking
>> > through the source code, I think the map tasks take responsibility of
>> > sorting the input data, and the reduce tasks just merge the map outputs
>> and
>> > write them into HDFS. But here I've got a question I couldn't understand:
>> > the time cost of the reduce phase of each reduce task, that is writing
>> data
>> > into HDFS, is different from each other. Since the input data and
>> operations
>> > of each reduce task is the same, what reason will cause the execution
>> time
>> > different? Is there anything wrong of my understanding? Does anybody have
>> > any experience on this? Badly need your help, thanks.
>> >
>> > Best Regards,
>> > Carp
>> >
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

--
Best Regards

Jeff Zhang