Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Remove the row in MR job?

Copy link to this message
Remove the row in MR job?

I have a table which I want to parse over a MR job.

Today, I'm using a scan to parse all the rows. Each row is retrieve,
removed, and the parsed (feeding 2 other tables)

The goal is to parse all the content while some process might still be
adding some more.

On the map method from the MR job, can I delete the row I'm working
with? If so, how should I do? should I take the table from the pool,
and simply call the delete method? The issue is, doing a delete for
each line will take a while. I would prefer to batch them, but I don't
know when will be the last line, so it's difficult to know when to
send the batch.  Is there a way to say to the MR job to delete this
line? Also, what's the impact on the MR job if I delete the row it's
working one?

Or is the MR job not the best way to do that?