Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Using timestamps as "transaction ids" for idempotent counters.

Copy link to this message
Using timestamps as "transaction ids" for idempotent counters.

I use a table for counting stuff and want to do updates by pushing
increments rather than get -> add in application -> put.

To ensure idempotence (i.e avoid over counting) I thought about (mis-)using
a cell's timestamp as a kind of <transaction id>. This transaction id would
be some strictly increasing number defined by the application writing the
increments, so let's call it <external_tmst>. I am looking for a call like:

incrementColumnValue(<row>, <colFam>, <counter_name>, <inc_value>,
<external_tmst>) //normal signature is without last argument

which applies the <inc_value> ONLY IF <external_tmst> is larger than the
cell's most recent version's timestamp (== last transaction id). This way,
if the external application attempts to re-insert the same data multiple
times no change would take place.

My questions are:
1. Is this a good idea to begin with?
2. Does the HBase client offer this kind of functionality, is it planned or
can it be implemented?

It appears that co-processors are able to handle this kind of logic but I
think I will be stuck with 0.90.6 for a while. I also heard about HBaseHUT (
https://github.com/sematext/HBaseHUT) but I am not sure it addresses the
issue of having idempotent counters.

Thank you,