Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Coprocessor end point vs MapReduce?


Copy link to this message
-
Re: Coprocessor end point vs MapReduce?
Doug Meil 2012-10-18, 19:18

I agree with the concern and there isn't a ton of guidance on this area
yet.

On 10/18/12 2:01 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:

>Doug,
>
>One thing that concerns me is that a lot of folks are gravitating to
>Coprocessors and may be using them for the wrong thing.
>Has anyone done any sort of research as to some of the limitations and
>negative impacts on using coprocessors?
>
>While I haven't really toyed with the idea of bulk deletes, periodic
>deletes is probably not a good use of coprocessors.... however using them
>to synchronize tables would be a valid use case.
>
>Thx
>
>-Mike
>
>On Oct 18, 2012, at 7:36 AM, Doug Meil <[EMAIL PROTECTED]>
>wrote:
>
>>
>> To echo what Mike said about KISS, would you use triggers for a large
>> time-sensitive batch job in an RDBMS?  It's possible, but probably not.
>> Then you might want to think twice about using co-processors for such a
>> purpose with HBase.
>>
>>
>>
>>
>>
>> On 10/17/12 9:50 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
>>
>>> Run your weekly job in a low priority fair scheduler/capacity scheduler
>>> queue.
>>>
>>> Maybe its just me, but I look at Coprocessors as a similar structure to
>>> RDBMS triggers and stored procedures.
>>> You need to restrain and use them sparingly otherwise you end up
>>>creating
>>> performance issues.
>>>
>>> Just IMHO.
>>>
>>> -Mike
>>>
>>> On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
>>> <[EMAIL PROTECTED]> wrote:
>>>
>>>> I don't have any concern about the time it's taking. It's more about
>>>> the load it's putting on the cluster. I have other jobs that I need to
>>>> run (secondary index, data processing, etc.). So the more time this
>>>> new job is taking, the less CPU the others will have.
>>>>
>>>> I tried the M/R and I really liked the way it's done. So my only
>>>> concern will really be the performance of the delete part.
>>>>
>>>> That's why I'm wondering what's the best practice to move a row to
>>>> another table.
>>>>
>>>> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>>>>> If you're going to be running this weekly, I would suggest that you
>>>>> stick
>>>>> with the M/R job.
>>>>>
>>>>> Is there any reason why you need to be worried about the time it
>>>>>takes
>>>>> to do
>>>>> the deletes?
>>>>>
>>>>>
>>>>> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
>>>>> <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> I'm expecting to run the job weekly. I initially thought about using
>>>>>> end points because I found HBASE-6942 which was a good example for
>>>>>>my
>>>>>> needs.
>>>>>>
>>>>>> I'm fine with the Put part for the Map/Reduce, but I'm not sure
>>>>>>about
>>>>>> the delete. That's why I look at coprocessors. Then I figure that I
>>>>>> also can do the Put on the coprocessor side.
>>>>>>
>>>>>> On a M/R, can I delete the row I'm dealing with based on some
>>>>>>criteria
>>>>>> like timestamp? If I do that, I will not do bulk deletes, but I will
>>>>>> delete the rows one by one, right? Which might be very slow.
>>>>>>
>>>>>> If in the future I want to run the job daily, might that be an
>>>>>>issue?
>>>>>>
>>>>>> Or should I go with the initial idea of doing the Put with the M/R
>>>>>>job
>>>>>> and the delete with HBASE-6942?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JM
>>>>>>
>>>>>>
>>>>>> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm a firm believer in KISS (Keep It Simple, Stupid)
>>>>>>>
>>>>>>> The Map/Reduce (map job only) is the simplest and least prone to
>>>>>>> failure.
>>>>>>>
>>>>>>> Not sure why you would want to do this using coprocessors.
>>>>>>>
>>>>>>> How often are you running this job? It sounds like its going to be
>>>>>>> sporadic.
>>>>>>>
>>>>>>> -Mike
>>>>>>>
>>>>>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
>>>>>>> <[EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Can someone please help me to understand the pros and cons between