Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> manipulating key in combine phase


+
Amit Sela 2014-01-12, 16:25
+
John Lilley 2014-01-13, 00:28
Copy link to this message
-
Re: manipulating key in combine phase
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com
On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <[EMAIL PROTECTED]>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:[EMAIL PROTECTED]]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>
+
Devin Suiter RDX 2014-01-13, 19:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB