Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Coprocessor end point vs MapReduce?


+
Jean-Marc Spaggiari 2012-10-18, 00:11
+
Michael Segel 2012-10-18, 00:27
+
Jean-Marc Spaggiari 2012-10-18, 01:19
+
Michael Segel 2012-10-18, 01:31
+
Jean-Marc Spaggiari 2012-10-18, 01:44
Copy link to this message
-
Re: Coprocessor end point vs MapReduce?
Michael Segel 2012-10-18, 01:50
Run your weekly job in a low priority fair scheduler/capacity scheduler queue.

Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures.
You need to restrain and use them sparingly otherwise you end up creating performance issues.

Just IMHO.

-Mike

On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:

> I don't have any concern about the time it's taking. It's more about
> the load it's putting on the cluster. I have other jobs that I need to
> run (secondary index, data processing, etc.). So the more time this
> new job is taking, the less CPU the others will have.
>
> I tried the M/R and I really liked the way it's done. So my only
> concern will really be the performance of the delete part.
>
> That's why I'm wondering what's the best practice to move a row to
> another table.
>
> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>> If you're going to be running this weekly, I would suggest that you stick
>> with the M/R job.
>>
>> Is there any reason why you need to be worried about the time it takes to do
>> the deletes?
>>
>>
>> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi Mike,
>>>
>>> I'm expecting to run the job weekly. I initially thought about using
>>> end points because I found HBASE-6942 which was a good example for my
>>> needs.
>>>
>>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about
>>> the delete. That's why I look at coprocessors. Then I figure that I
>>> also can do the Put on the coprocessor side.
>>>
>>> On a M/R, can I delete the row I'm dealing with based on some criteria
>>> like timestamp? If I do that, I will not do bulk deletes, but I will
>>> delete the rows one by one, right? Which might be very slow.
>>>
>>> If in the future I want to run the job daily, might that be an issue?
>>>
>>> Or should I go with the initial idea of doing the Put with the M/R job
>>> and the delete with HBASE-6942?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>>
>>> 2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
>>>> Hi,
>>>>
>>>> I'm a firm believer in KISS (Keep It Simple, Stupid)
>>>>
>>>> The Map/Reduce (map job only) is the simplest and least prone to
>>>> failure.
>>>>
>>>> Not sure why you would want to do this using coprocessors.
>>>>
>>>> How often are you running this job? It sounds like its going to be
>>>> sporadic.
>>>>
>>>> -Mike
>>>>
>>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
>>>> <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Can someone please help me to understand the pros and cons between
>>>>> those 2 options for the following usecase?
>>>>>
>>>>> I need to transfer all the rows between 2 timestamps to another table.
>>>>>
>>>>> My first idea was to run a MapReduce to map the rows and store them on
>>>>> another table, and then delete them using an end point coprocessor.
>>>>> But the more I look into it, the more I think the MapReduce is not a
>>>>> good idea and I should use a coprocessor instead.
>>>>>
>>>>> BUT... The MapReduce framework guarantee me that it will run against
>>>>> all the regions. I tried to stop a regionserver while the job was
>>>>> running. The region moved, and the MapReduce restarted the job from
>>>>> the new location. Will the coprocessor do the same thing?
>>>>>
>>>>> Also, I found the webconsole for the MapReduce with the number of
>>>>> jobs, the status, etc. Is there the same thing with the coprocessors?
>>>>>
>>>>> Are all coprocessors running at the same time on all regions, which
>>>>> mean we can have 100 of them running on a regionserver at a time? Or
>>>>> are they running like the MapReduce jobs based on some configured
>>>>> values?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JM
>>>>>
>>>>
>>>>
>>>
>>
>>
>
+
Anoop Sam John 2012-10-18, 04:20
+
Doug Meil 2012-10-18, 12:36
+
Michael Segel 2012-10-18, 18:01
+
Doug Meil 2012-10-18, 19:18
+
Anoop Sam John 2012-10-19, 03:33
+
lohit 2012-10-19, 03:58
+
Jean-Marc Spaggiari 2012-10-25, 13:01
+
Anoop John 2012-10-25, 17:13
+
Jerry Lam 2012-10-25, 20:43