Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Delete client API.

Copy link to this message
Re: Delete client API.


<< memstore-ts will just be the time at which kv arrived into
memstore and not the real ordering of the operation.>>

By virtue of the fact that the regionserver is the only entity writing
updates for a given key (each key belongs to exactly one RS) this will
also be a global ordering of the operations for that key. Of course we
need to make the memstore-ts a logical number instead of the time on that
RS if we want to account for clock skew between RS's when a region fails

@Ian: you are correct about the reasoning that "if there were no client
specified timestamps then this issue would never exist". But for the
better or for the worse, Hbase overloads timestamps and versions. So there
is no inherent way to achieve versioning other than by using timestamps.

Also, I don¹t understand what overhead it would add to each call? There is
a memstore-ts maintained anyways...

On 1/18/12 6:38 AM, "Ian Varley" <[EMAIL PROTECTED]> wrote:

>M.C., why would option B be superior to simply letting the native
>timestamps in HBase do what they were meant to do, and then storing your
>app-level logical timestamps in the cell itself along with the data? The
>(admittedly more correct) behavior you want is already the normal
>behavior when you're not setting application-defined timestamps.
>In other words: HBase already has a timestamp that behaves as you
>describe, and only when you intentionally use it for another purpose does
>the behavior become non-intuitive. And, other things will become
>non-intuitive too, like replication.
>In the FB messaging case, if I'm not mistaken, the official timestamp
>value is in use for something that isn't a timestamp at all (message ids,
>or something along those lines). So in that case, it would make sense
>that you'd want to also have another timestamp. I'm tempted to assert
>that that's an unusual use of the timestamp field, but then again, if the
>biggest use case of a product does something, it's hardly "unusual". :)
>At the very least, since it would add overhead to every cell, this should
>be an opt-in behavior (the ability to say, "I'm setting my own
>timestamps, so HBase should also keep its own real timestamp"). But then
>again, what's the argument for doing that rather than storing the
>timestamps in your cell value? Is it the added abilities the API gives
>you around time ranges?
>On Jan 18, 2012, at 1:51 AM, M. C. Srivas wrote:
>On Tue, Jan 17, 2012 at 8:56 PM, lars hofhansl
>The memstoreTS is used for visibility during an intra-row transaction.
>Are you proposing to do this only if the deletes/puts did not use the
>current time?
>The ability to define timestamps for all operations is crucial to HBase.
>o It ensures that HTable.batch works correctly (which reorders Deletes
>w.r.t. to Puts at the Region Server).
>o It ensures that replication works correctly.
>o many other scenarios
>If you do not use application defined timestamp the current time is used
>and everything works as expected.
>If you use application defined timestamps you are asking for a delete to
>be either in the future or the past, and you have to understand what that
>Maybe we should document the behavior better.
>I guess I am saying that I *do* understand the current "delete with TS"
>behavior, and I find the current implementation  unstable and
>non-deterministic.  Documenting it more thoroughly does not make it less
>quirky or more stable.  I propose fixing it along the lines suggested in
>option B.  Karthik seems to agree.
>-- Lars
>----- Original Message -----
>From: Karthik Ranganathan
><[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>; lars hofhansl <
>Sent: Tuesday, January 17, 2012 3:27 PM
>Subject: Re: Delete client API.