Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Reduce Task Clarification

Copy link to this message
Re: Reduce Task Clarification
Are you looking to do a secondary sort under a grouped key?

A reduce() is called once for each globally unique map() emitted key,
along with all values grouped for it. To sort the grouped data, you
need to use a separate sort comparator and perform the 'secondary

On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <[EMAIL PROTECTED]> wrote:
> I am working on a MapReduce job where I would like to have the output sorted
> by a LongWritable value. I read the Anatomy of a MapReduce Run in the
> Definitive Guide and it didn't say explicitly whether reduce() gets called
> only once per map output key. If it does get called only once I was thinking
> that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)
> to do the sorting.
> Thank you for your time.
> --
> Sam Garrett
> ActionX, NYC

Harsh J