Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - manipulating key in combine phase


+
Amit Sela 2014-01-12, 16:25
+
John Lilley 2014-01-13, 00:28
+
Devin Suiter RDX 2014-01-13, 13:06
Copy link to this message
-
Re: manipulating key in combine phase
Devin Suiter RDX 2014-01-13, 19:45
I believe combine process is after that step, so, no.

What comes out of a mapper is a set of records {k1, v1} {k1, v2} {k1, v(n)}
{k2, v1} {k2, v2} {k2, v(n)} and then reducers aggregate that into arrays
like {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}} and performs logic on the
value set for each unique key, for example.

What comes out of a combiner is {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}},
the same {k, v} map that reducer builds, and then the reducer does the
logic on the value set for each unique key.

If you change the key in the combiner, you aren't working with the same
set, and so you've used your combiner as another mapper, essentially. But
your method signature won't be right.

Combiner is designed solely to reduce network traffic from mappers to
reducers, since there are usually more mappers than reducers, it reduces
bottlenecking at switches.

If you want to change the key after you've set the key, I feel like you
should use chainMapper and/or write custom input/output format classes if
you need to.

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com
On Mon, Jan 13, 2014 at 12:39 PM, Amit Sela <[EMAIL PROTECTED]> wrote:

> More than a solution, I'd like to know if a combiner is allowed to change
> the key ? will it interfere with the mappers sort/merge ?
>
>
> On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <[EMAIL PROTECTED]> wrote:
>
>> Amit,
>>
>> Have you explored chainMapper class?
>>
>> *Devin Suiter*
>> Jr. Data Solutions Software Engineer
>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>> Google Voice: 412-256-8556 | www.rdx.com
>>
>>
>> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <[EMAIL PROTECTED]>wrote:
>>
>>>  Isn’t this is what you’d normally do in the Mapper?
>>>
>>> My understanding of the combiner is that it is like a “mapper-side
>>> pre-reducer” and operates on blocks of data that have already been sorted
>>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>>
>>> john
>>>
>>>
>>>
>>> *From:* Amit Sela [mailto:[EMAIL PROTECTED]]
>>> *Sent:* Sunday, January 12, 2014 9:26 AM
>>> *To:* [EMAIL PROTECTED]
>>> *Subject:* manipulating key in combine phase
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I was wondering if it is possible to manipulate the key during combine:
>>>
>>>
>>>
>>> Say I have a mapreduce job where the key has many qualifiers.
>>>
>>> I would like to "split" the key into two (or more) keys if it has more
>>> than, say 100 qualifiers.
>>>
>>> In the combiner class I would do something like:
>>>
>>>
>>>
>>> int count = 0;
>>>
>>> for (Writable value: values) {
>>>
>>>   if (++count >= 100){
>>>
>>>     context.write(newKey, value);
>>>
>>>   } else {
>>>
>>>     context.write(key, value);
>>>
>>>   }
>>>
>>> }
>>>
>>>
>>>
>>> where newKey is something like key+randomUUID
>>>
>>>
>>>
>>> I know that the combiner can be called "zero, once or more..." and I'm
>>> getting strange results (same key written more then once) so I would be
>>> glad to get some deeper insight into how the combiner works.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Amit.
>>>
>>
>>
>