Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> running MR job and puts on the same table


Copy link to this message
-
running MR job and puts on the same table
I have a usecase where I push data in my HTable in waves followed by
Mapper-only processing. Currently once a row is processed in map I
immediately mark it as processed=true. For this inside the map I execute a
table.put(isprocessed=true). I am not sure if modifying the table like this
is a good idea. I am also concerned that I am modifying the same table that
I am running the MR job on.
So I am thinking of another approach where I accumulate the processed rows
in a list (or a better compact data structure) and use the cleanup method
of the MR job to execute all the table.put(isprocessed=true) at once.
What is the suggested best practice?

- R
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB