Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> manipulating key in combine phase


+
Amit Sela 2014-01-12, 16:25
+
John Lilley 2014-01-13, 00:28
Copy link to this message
-
Re: manipulating key in combine phase
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com
On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <[EMAIL PROTECTED]>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:[EMAIL PROTECTED]]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>
+
Devin Suiter RDX 2014-01-13, 19:45