Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> manipulating key in combine phase

Amit Sela 2014-01-12, 16:25
Copy link to this message
RE: manipulating key in combine phase
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.

From: Amit Sela [mailto:[EMAIL PROTECTED]]
Sent: Sunday, January 12, 2014 9:26 AM
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.


Devin Suiter RDX 2014-01-13, 13:06
Devin Suiter RDX 2014-01-13, 19:45