Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Quick Clarification of sort mechanism


Copy link to this message
-
Re: Quick Clarification of sort mechanism
Hi Rob,

The sort is an internal mechanism in hadoop, the reduce step will always do
sort on the keys.
If you want to sort the result by count, you could start a second job with
the input from the first job, and use the count as the key, word as the
value,.

On Fri, Jan 15, 2010 at 2:42 PM, Rob Stewart <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I am having a look at the WordCount java example here:
>
> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Walk-through
>
> I am wanting a word count application that, instead of sorting by key
> (alphabetically by word), I want to sort by the count (frequency) of the
> words.
>
> I can't see in the reduce method in the above example where exactly the
> key/values get specified to order by key alphabetically? Or how I can
> override this to state to for by the value of the final reduce (i.e. by the
> frequency).
>
> Thanks,
>
> Rob Stewart
>

--
Best Regards

Jeff Zhang