-Using timestamps as "transaction ids" for idempotent counters.
David Koch 2012-08-24, 14:47
I use a table for counting stuff and want to do updates by pushing
increments rather than get -> add in application -> put.
To ensure idempotence (i.e avoid over counting) I thought about (mis-)using
a cell's timestamp as a kind of <transaction id>. This transaction id would
be some strictly increasing number defined by the application writing the
increments, so let's call it <external_tmst>. I am looking for a call like:
incrementColumnValue(<row>, <colFam>, <counter_name>, <inc_value>,
<external_tmst>) //normal signature is without last argument
which applies the <inc_value> ONLY IF <external_tmst> is larger than the
cell's most recent version's timestamp (== last transaction id). This way,
if the external application attempts to re-insert the same data multiple
times no change would take place.
My questions are:
1. Is this a good idea to begin with?
2. Does the HBase client offer this kind of functionality, is it planned or
can it be implemented?
It appears that co-processors are able to handle this kind of logic but I
think I will be stuck with 0.90.6 for a while. I also heard about HBaseHUT (
https://github.com/sematext/HBaseHUT) but I am not sure it addresses the
issue of having idempotent counters.