Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Coprocessor end point vs MapReduce?


+
Jean-Marc Spaggiari 2012-10-18, 00:11
+
Michael Segel 2012-10-18, 00:27
Copy link to this message
-
Re: Coprocessor end point vs MapReduce?
Hi Mike,

I'm expecting to run the job weekly. I initially thought about using
end points because I found HBASE-6942 which was a good example for my
needs.

I'm fine with the Put part for the Map/Reduce, but I'm not sure about
the delete. That's why I look at coprocessors. Then I figure that I
also can do the Put on the coprocessor side.

On a M/R, can I delete the row I'm dealing with based on some criteria
like timestamp? If I do that, I will not do bulk deletes, but I will
delete the rows one by one, right? Which might be very slow.

If in the future I want to run the job daily, might that be an issue?

Or should I go with the initial idea of doing the Put with the M/R job
and the delete with HBASE-6942?

Thanks,

JM
2012/10/17, Michael Segel <[EMAIL PROTECTED]>:
> Hi,
>
> I'm a firm believer in KISS (Keep It Simple, Stupid)
>
> The Map/Reduce (map job only) is the simplest and least prone to failure.
>
> Not sure why you would want to do this using coprocessors.
>
> How often are you running this job? It sounds like its going to be
> sporadic.
>
> -Mike
>
> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> wrote:
>
>> Hi,
>>
>> Can someone please help me to understand the pros and cons between
>> those 2 options for the following usecase?
>>
>> I need to transfer all the rows between 2 timestamps to another table.
>>
>> My first idea was to run a MapReduce to map the rows and store them on
>> another table, and then delete them using an end point coprocessor.
>> But the more I look into it, the more I think the MapReduce is not a
>> good idea and I should use a coprocessor instead.
>>
>> BUT... The MapReduce framework guarantee me that it will run against
>> all the regions. I tried to stop a regionserver while the job was
>> running. The region moved, and the MapReduce restarted the job from
>> the new location. Will the coprocessor do the same thing?
>>
>> Also, I found the webconsole for the MapReduce with the number of
>> jobs, the status, etc. Is there the same thing with the coprocessors?
>>
>> Are all coprocessors running at the same time on all regions, which
>> mean we can have 100 of them running on a regionserver at a time? Or
>> are they running like the MapReduce jobs based on some configured
>> values?
>>
>> Thanks,
>>
>> JM
>>
>
>
+
Michael Segel 2012-10-18, 01:31
+
Jean-Marc Spaggiari 2012-10-18, 01:44
+
Michael Segel 2012-10-18, 01:50
+
Anoop Sam John 2012-10-18, 04:20
+
Doug Meil 2012-10-18, 12:36
+
Michael Segel 2012-10-18, 18:01
+
Doug Meil 2012-10-18, 19:18
+
Anoop Sam John 2012-10-19, 03:33
+
lohit 2012-10-19, 03:58
+
Jean-Marc Spaggiari 2012-10-25, 13:01
+
Anoop John 2012-10-25, 17:13
+
Jerry Lam 2012-10-25, 20:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB