Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Remove the row in MR job?


Copy link to this message
-
Re: Remove the row in MR job?

I'm not entirely sure of the use-case, but here are some thoughts on thisŠ

re:  "should I take the table from the pool, and simply call the delete
method?"

Yep, you can construct an HTable instance within a MR job.  But use the
delete that takes a list because the single-delete will invoke an RPC for
each one (not great over an MR job).

Construct the HTable instance at the Mapper level (not map-method level)
and keep a buffer of deletes in a List.  At the end of the job, send any
un-processed deletes in the cleanup method.
I'm not entirely sure why you'd want to delete every row in a table (as
opposed to processing all the rows in Table1 and generating an entirely
new Table2).  And then drop Table1 when you're done with it.  That seems
like it would be less hassle than deleting every row (since the table is
empty anyway).
On 10/12/12 1:20 PM, "Jean-Marc Spaggiari" <[EMAIL PROTECTED]> wrote:

>Hi,
>
>I have a table which I want to parse over a MR job.
>
>Today, I'm using a scan to parse all the rows. Each row is retrieve,
>removed, and the parsed (feeding 2 other tables)
>
>The goal is to parse all the content while some process might still be
>adding some more.
>
>On the map method from the MR job, can I delete the row I'm working
>with? If so, how should I do? should I take the table from the pool,
>and simply call the delete method? The issue is, doing a delete for
>each line will take a while. I would prefer to batch them, but I don't
>know when will be the last line, so it's difficult to know when to
>send the batch.  Is there a way to say to the MR job to delete this
>line? Also, what's the impact on the MR job if I delete the row it's
>working one?
>
>Or is the MR job not the best way to do that?
>
>Thanks,
>
>JM
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB