Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Reduce Task Clarification


Copy link to this message
-
Re: Reduce Task Clarification
Are you looking to do a secondary sort under a grouped key?

A reduce() is called once for each globally unique map() emitted key,
along with all values grouped for it. To sort the grouped data, you
need to use a separate sort comparator and perform the 'secondary
sort'.

On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <[EMAIL PROTECTED]> wrote:
> I am working on a MapReduce job where I would like to have the output sorted
> by a LongWritable value. I read the Anatomy of a MapReduce Run in the
> Definitive Guide and it didn't say explicitly whether reduce() gets called
> only once per map output key. If it does get called only once I was thinking
> that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)
> to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB