Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: how to find top N values using map-reduce ?

Russell Jurney 2013-02-02, 08:10
Russell Jurney 2013-02-02, 07:30
Copy link to this message
Re: how to find top N values using map-reduce ?

Can you tell more about:
 * How big is N
 * How big is the input dataset
 * How many mappers you have
 * Do input splits correlate with the sorting criterion for top N?

Depending on the answers, very different strategies will be optimal.

On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:

> I am looking for a better solution for this.
> 1 way to do this would be to find top N values from each mappers and
> then find out the top N out of them in 1 reducer.  I am afraid that
> this won't work effectively if my N is larger than number of values in
> my inputsplit (or mapper input).
> Otherway is to just sort all of them in 1 reducer and then do the cat of
> top-N.
> Wondering if there is any better approach to do this ?
> Regards
> Praveenesh

Eugene Kirpichov
http://jkff.info/software/timeplotters - my performance visualization tools