Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Coprocessor end point vs MapReduce?

Copy link to this message
RE: Coprocessor end point vs MapReduce?

Hi Jean
       >>Are all coprocessors running at the same time on all regions
Yes it will try to run all in parallel.. It will submit one callable for each of the involved region. Though it uses the Executor pool available with the HTable. So the available slots it that and total regions count matters the parallel run..
>>The MapReduce framework guarantee me that it will run against
>> all the regions. I tried to stop a regionserver while the job was
>> running. The region moved, and the MapReduce restarted the job from
>> the new location. Will the coprocessor do the same thing
Yes it will.. There will be retry (max 10 times by def) for every call to a region.
Though one another point came to my mind now is what will happen if in btw a region splits? How MR will handle this case? Sorry I dont know.Need to see the code.

Regarding your use case Jean,
 You want to put some data to another table right? How you plan to make use of CP for this Put?(I wonder) For the bulk delete as you said if you use an MR, it is like a scan to client side and delete rows one by one(Though many parallel clients ur Mappers). So as you expect it will be very slow comparing to the new approach what we are trying to do in 6942..

Hope I have answered your questions.. :)

From: Michael Segel [[EMAIL PROTECTED]]
Sent: Thursday, October 18, 2012 7:20 AM
Subject: Re: Coprocessor end point vs MapReduce?

Run your weekly job in a low priority fair scheduler/capacity scheduler queue.

Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures.
You need to restrain and use them sparingly otherwise you end up creating performance issues.

Just IMHO.


On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:

> I don't have any concern about the time it's taking. It's more about
> the load it's putting on the cluster. I have other jobs that I need to
> run (secondary index, data processing, etc.). So the more time this
> new job is taking, the less CPU the others will have.
> I tried the M/R and I really liked the way it's done. So my only
> concern will really be the performance of the delete part.
> That's why I'm wondering what's the best practice to move a row to
> another table.
> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>> If you're going to be running this weekly, I would suggest that you stick
>> with the M/R job.
>> Is there any reason why you need to be worried about the time it takes to do
>> the deletes?
>> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>> wrote:
>>> Hi Mike,
>>> I'm expecting to run the job weekly. I initially thought about using
>>> end points because I found HBASE-6942 which was a good example for my
>>> needs.
>>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about
>>> the delete. That's why I look at coprocessors. Then I figure that I
>>> also can do the Put on the coprocessor side.
>>> On a M/R, can I delete the row I'm dealing with based on some criteria
>>> like timestamp? If I do that, I will not do bulk deletes, but I will
>>> delete the rows one by one, right? Which might be very slow.
>>> If in the future I want to run the job daily, might that be an issue?
>>> Or should I go with the initial idea of doing the Put with the M/R job
>>> and the delete with HBASE-6942?
>>> Thanks,
>>> JM
>>> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>>>> Hi,
>>>> I'm a firm believer in KISS (Keep It Simple, Stupid)
>>>> The Map/Reduce (map job only) is the simplest and least prone to
>>>> failure.
>>>> Not sure why you would want to do this using coprocessors.
>>>> How often are you running this job? It sounds like its going to be
>>>> sporadic.
>>>> -Mike
>>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari