Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: manipulating key in combine phase


Copy link to this message
-
Re: manipulating key in combine phase
More than a solution, I'd like to know if a combiner is allowed to change
the key ? will it interfere with the mappers sort/merge ?
On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <[EMAIL PROTECTED]> wrote:

> Amit,
>
> Have you explored chainMapper class?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <[EMAIL PROTECTED]>wrote:
>
>>  Isn’t this is what you’d normally do in the Mapper?
>>
>> My understanding of the combiner is that it is like a “mapper-side
>> pre-reducer” and operates on blocks of data that have already been sorted
>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>
>> john
>>
>>
>>
>> *From:* Amit Sela [mailto:[EMAIL PROTECTED]]
>> *Sent:* Sunday, January 12, 2014 9:26 AM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* manipulating key in combine phase
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I was wondering if it is possible to manipulate the key during combine:
>>
>>
>>
>> Say I have a mapreduce job where the key has many qualifiers.
>>
>> I would like to "split" the key into two (or more) keys if it has more
>> than, say 100 qualifiers.
>>
>> In the combiner class I would do something like:
>>
>>
>>
>> int count = 0;
>>
>> for (Writable value: values) {
>>
>>   if (++count >= 100){
>>
>>     context.write(newKey, value);
>>
>>   } else {
>>
>>     context.write(key, value);
>>
>>   }
>>
>> }
>>
>>
>>
>> where newKey is something like key+randomUUID
>>
>>
>>
>> I know that the combiner can be called "zero, once or more..." and I'm
>> getting strange results (same key written more then once) so I would be
>> glad to get some deeper insight into how the combiner works.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Amit.
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB