I have a table which I want to parse over a MR job.
Today, I'm using a scan to parse all the rows. Each row is retrieve,
removed, and the parsed (feeding 2 other tables)
The goal is to parse all the content while some process might still be
adding some more.
On the map method from the MR job, can I delete the row I'm working
with? If so, how should I do? should I take the table from the pool,
and simply call the delete method? The issue is, doing a delete for
each line will take a while. I would prefer to batch them, but I don't
know when will be the last line, so it's difficult to know when to
send the batch. Is there a way to say to the MR job to delete this
line? Also, what's the impact on the MR job if I delete the row it's
Or is the MR job not the best way to do that?
Doug Meil 2012-10-12, 18:41
Jean-Marc Spaggiari 2012-10-12, 19:47
Doug Meil 2012-10-12, 20:01
Jean-Marc Spaggiari 2012-10-13, 23:22