Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - multiple puts in reducer?


Copy link to this message
-
Re: multiple puts in reducer?
Michel Segel 2012-02-29, 13:18
There is nothing wrong in writing the output from a reducer to HBase.

The question you have to ask yourself is why are you using a reducer in the first place. ;-)

Look, you have a database. Why do you need a reducer?

It's a simple question... Right? ;-)

Look, I apologize for being cryptic. This is one of those philosophical design questions where you the developer/architect have to figure out the answer for yourself.  Maybe I should submit this as an HBaseconn topic for a presentation?

Sort of like how to do an efficient table join in HBase....

HTH
Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 28, 2012, at 11:16 PM, Jacques <[EMAIL PROTECTED]> wrote:

> I see nothing wrong with using the output of the reducer into hbase.   You
> just need to make sure duplicated operations wouldn't cause problems.  If
> using tableoutputformat, don't use random seeded keys.  If working straight
> against htable,  don't use increment.  We do this for some situations and
> either don't care about overwrites or use checkAndPut with a skip option in
> the application code.
> On Feb 28, 2012 9:40 AM, "Ben Snively" <[EMAIL PROTECTED]> wrote:
>
>> Is there an assertion that you would never need to run a reducer when
>> writing to the DB?
>>
>> It seems that there are cases when you would not need one, but the general
>> statement doesn't apply to all use cases.
>>
>> If you were trying to process data where you may have two a map task (or
>> set of map tasks) output the same key,  you could have a case where you
>> need to reduce the data for that key prior to insert the result into hbase.
>>
>> Am I missing something, but to me, that would be the deciding factor.  If
>> the key/values output in the map task are the exact values that need to be
>> inserted into HBase versus multiple values aggregated together and the
>> results put into the hbase entry?
>>
>> Thanks,
>> Ben
>>
>>
>> On Tue, Feb 28, 2012 at 11:20 AM, Michael Segel
>> <[EMAIL PROTECTED]>wrote:
>>
>>> The better question is why would you need a reducer?
>>>
>>> That's a bit cryptic, I understand, but you have to ask yourself when do
>>> you need to use a reducer when you are writing to a database... ;-)
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 28, 2012, at 10:14 AM, "T Vinod Gupta" <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Mike,
>>>> I didn't understand - why would I not need reducer in hbase m/r? there
>>> can
>>>> be cases right.
>>>> My use case is very similar to Sujee's blog on frequency counting -
>>>> http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/
>>>> So in the reducer, I can do all the aggregations. Is there a better
>> way?
>>> I
>>>> can think of another way - to use increments in the map job itself. i
>>> have
>>>> to figure out if thats possible though.
>>>>
>>>> thanks
>>>>
>>>> On Tue, Feb 28, 2012 at 7:44 AM, Michel Segel <
>> [EMAIL PROTECTED]
>>>> wrote:
>>>>
>>>>> Yes you can do it.
>>>>> But why do you have a reducer when running a m/r job against HBase?
>>>>>
>>>>> The trick in writing multiple rows... You do it independently of the
>>>>> output from the map() method.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Feb 28, 2012, at 8:34 AM, T Vinod Gupta <[EMAIL PROTECTED]>
>>> wrote:
>>>>>
>>>>>> while doing map reduce on hbase tables, is it possible to do multiple
>>>>> puts
>>>>>> in the reducer? what i want is a way to be able to write multiple
>> rows.
>>>>> if
>>>>>> its not possible, then what are the other alternatives? i mean like
>>>>>> creating a wider table in that case.
>>>>>>
>>>>>> thanks
>>>>>
>>>
>>