I have a usecase where I push data in my HTable in waves followed by
Mapper-only processing. Currently once a row is processed in map I
immediately mark it as processed=true. For this inside the map I execute a
table.put(isprocessed=true). I am not sure if modifying the table like this
is a good idea. I am also concerned that I am modifying the same table that
I am running the MR job on.
So I am thinking of another approach where I accumulate the processed rows
in a list (or a better compact data structure) and use the cleanup method
of the MR job to execute all the table.put(isprocessed=true) at once.
What is the suggested best practice?