Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Remove the row in MR job?


Copy link to this message
-
Remove the row in MR job?
Hi,

I have a table which I want to parse over a MR job.

Today, I'm using a scan to parse all the rows. Each row is retrieve,
removed, and the parsed (feeding 2 other tables)

The goal is to parse all the content while some process might still be
adding some more.

On the map method from the MR job, can I delete the row I'm working
with? If so, how should I do? should I take the table from the pool,
and simply call the delete method? The issue is, doing a delete for
each line will take a while. I would prefer to batch them, but I don't
know when will be the last line, so it's difficult to know when to
send the batch.  Is there a way to say to the MR job to delete this
line? Also, what's the impact on the MR job if I delete the row it's
working one?

Or is the MR job not the best way to do that?

Thanks,

JM
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB