Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> manipulating key in combine phase


+
Amit Sela 2014-01-12, 16:25
Copy link to this message
-
RE: manipulating key in combine phase
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.
john

From: Amit Sela [mailto:[EMAIL PROTECTED]]
Sent: Sunday, January 12, 2014 9:26 AM
To: [EMAIL PROTECTED]
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.

Thanks,

Amit.
+
Devin Suiter RDX 2014-01-13, 13:06
+
Devin Suiter RDX 2014-01-13, 19:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB