Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Disable Sorting?


Copy link to this message
-
Re: Disable Sorting?
The sort is what's implementing the group by key function. You can't
have one without the other in Hadoop. Are you trying to disable the
sort because you think it's too slow?

-Joey

On Sun, Sep 11, 2011 at 2:43 AM, john smith <[EMAIL PROTECTED]> wrote:
> Hi Arun,
>
> Suppose I am doing a simple wordcount and the map-phase is over. After the
> shuffle, in each partition, the inputs to the reducer, come in a sorted
> order of keys. I want to disable this.
>
> Take the same case of wc. I don't mind the order in which my reduce gets the
> keys of a single partition. I guess hadoop does an external sort for this. I
> want to disable that.
>
> Thanks,
> jS
>
> On Sun, Sep 11, 2011 at 7:03 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>
>> The point of a 'reduce phase' is to aggregate keys from different maps
>> (i.e. all inputs).
>>
>> I'm not sure what you are trying to do, but a use-case will help.
>>
>> IAC, the only way to achieve what you are trying to do is to run to jobs
>> with the first a map-only job (i.e. #reduces = 0).
>>
>> Arun
>>
>> On Sep 10, 2011, at 10:19 PM, john smith wrote:
>>
>> > Hey,
>> >
>> > I have reduce phases too. But for each reduce, I dont need sorted input
>> > (map-output for that corresponding reduce task).
>> > Setting #red to 0 completely removes the reduce phase.
>> >
>> > Am I missing something?
>> >
>> > Thanks,
>> >
>> > On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> Run a map-only job with #reduces set to 0.
>> >>
>> >> Arun
>> >>
>> >> On Sep 10, 2011, at 2:06 AM, john smith wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Some of the MR jobs I run doesn't need sorting of map-output in each
>> >>> partition. Is there someway I can disable it?
>> >>>
>> >>> Any help?
>> >>>
>> >>> Thanks
>> >>> jS
>> >>
>> >>
>>
>>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434