Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: how to find top N values using map-reduce ?


Copy link to this message
-
Re: how to find top N values using map-reduce ?
Hi,

Can you tell more about:
 * How big is N
 * How big is the input dataset
 * How many mappers you have
 * Do input splits correlate with the sorting criterion for top N?

Depending on the answers, very different strategies will be optimal.

On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote:

> I am looking for a better solution for this.
>
> 1 way to do this would be to find top N values from each mappers and
> then find out the top N out of them in 1 reducer.  I am afraid that
> this won't work effectively if my N is larger than number of values in
> my inputsplit (or mapper input).
>
> Otherway is to just sort all of them in 1 reducer and then do the cat of
> top-N.
>
> Wondering if there is any better approach to do this ?
>
> Regards
> Praveenesh
>

--
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov
http://jkff.info/software/timeplotters - my performance visualization tools
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB