Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - manipulating key in combine phase


Copy link to this message
-
RE: manipulating key in combine phase
John Lilley 2014-01-13, 00:28
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.
john

From: Amit Sela [mailto:[EMAIL PROTECTED]]
Sent: Sunday, January 12, 2014 9:26 AM
To: [EMAIL PROTECTED]
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.

Thanks,

Amit.