Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Coprocessor end point vs MapReduce?

Copy link to this message
RE: Coprocessor end point vs MapReduce?
A CP and Endpoints operates at a region level.. Any operation within one region we can perform using this..  I have seen in below use case that along with the delete there was a need for inserting data to some other table also.. Also this was kind of a periodic action.. I really doubt how the endpoints alone can be used here.. I also tend towards the MR..

  The idea behind the bulk delete CP is simple.  We have a use case of deleting a bulk of rows and this need to be online delete. I also have seen in the mailing list many people ask question regarding that... In all people were using scans and get the rowkeys to the client side and then doing the deletes..  Yes most of the time complaint was the slowness..  One bulk delete performance improvement was done in HBASE-6284..  Still thought we can do all the operation (scan+delete) in server side and we can make use of the endpoints here.. This will be much more faster and can be used for online bulk deletes..


From: Michael Segel [[EMAIL PROTECTED]]
Sent: Thursday, October 18, 2012 11:31 PM
Subject: Re: Coprocessor end point vs MapReduce?


One thing that concerns me is that a lot of folks are gravitating to Coprocessors and may be using them for the wrong thing.
Has anyone done any sort of research as to some of the limitations and negative impacts on using coprocessors?

While I haven't really toyed with the idea of bulk deletes, periodic deletes is probably not a good use of coprocessors.... however using them to synchronize tables would be a valid use case.



On Oct 18, 2012, at 7:36 AM, Doug Meil <[EMAIL PROTECTED]> wrote:

> To echo what Mike said about KISS, would you use triggers for a large
> time-sensitive batch job in an RDBMS?  It's possible, but probably not.
> Then you might want to think twice about using co-processors for such a
> purpose with HBase.
> On 10/17/12 9:50 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
>> Run your weekly job in a low priority fair scheduler/capacity scheduler
>> queue.
>> Maybe its just me, but I look at Coprocessors as a similar structure to
>> RDBMS triggers and stored procedures.
>> You need to restrain and use them sparingly otherwise you end up creating
>> performance issues.
>> Just IMHO.
>> -Mike
>> On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
>> <[EMAIL PROTECTED]> wrote:
>>> I don't have any concern about the time it's taking. It's more about
>>> the load it's putting on the cluster. I have other jobs that I need to
>>> run (secondary index, data processing, etc.). So the more time this
>>> new job is taking, the less CPU the others will have.
>>> I tried the M/R and I really liked the way it's done. So my only
>>> concern will really be the performance of the delete part.
>>> That's why I'm wondering what's the best practice to move a row to
>>> another table.
>>> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>>>> If you're going to be running this weekly, I would suggest that you
>>>> stick
>>>> with the M/R job.
>>>> Is there any reason why you need to be worried about the time it takes
>>>> to do
>>>> the deletes?
>>>> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
>>>> wrote:
>>>>> Hi Mike,
>>>>> I'm expecting to run the job weekly. I initially thought about using
>>>>> end points because I found HBASE-6942 which was a good example for my
>>>>> needs.
>>>>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about
>>>>> the delete. That's why I look at coprocessors. Then I figure that I
>>>>> also can do the Put on the coprocessor side.
>>>>> On a M/R, can I delete the row I'm dealing with based on some criteria
>>>>> like timestamp? If I do that, I will not do bulk deletes, but I will
>>>>> delete the rows one by one, right? Which might be very slow.