Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Delete client API.


Copy link to this message
-
Re: Delete client API.

@Srivas - totally agree that B is the correct thing to do.

One way we have talked about implementing this is using the memstore ts.
Every insert of a KV into the memstore is given a memstore-ts. These are
persisted only till they are needed (to ensure read atomicity for
scanners) and then that value is zeroed out on a subsequent compaction
(saves space).#16; If we retained the memstore-ts even beyond these
compactions, we could get a deterministic order for the puts and deletes
(first insert ts < del ts < second insert ts).

Thanks
Karthik
On 1/17/12 2:14 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote:

>On Tue, Jan 17, 2012 at 10:07 AM, lars hofhansl <[EMAIL PROTECTED]>
>wrote:
>
>> Yeah, it's confusing if one expects it to work like in a relational
>> database.
>> You can even do worse. If you by accident place a delete in the future
>>all
>> current inserts will be hidden until the next major compaction. :)
>> I got confused about this myself just recently (see my mail on the
>> dev-list).
>>
>>
>> In the end this is a pretty powerful feature and core to how HBase works
>> (not saying that is not confusing though).
>>
>>
>> If one keeps the following two points in mind it makes more sense:
>> 1. Delete just sets a tomb stone marker at a specific TS (marking
>> everything older as deleted).
>> 2. Everything is versioned, if no version is specified the current time
>> (at the regionserver) is used.
>>
>> In your example1 below t3 > 6, hence the insert is hidden.
>> In example2 both delete and insert TS are 6, hence the insert is hidden.
>>
>
>Lets consider my example2 for a little longer. Sequence of events
>
>   1.  ins  val1  with TS=6 set by client
>   2.  del  entire row at TS=6 set by client
>   3.  ins  val2  with TS=6  set by client
>   4.  read row
>
>The row returns nothing even though the insert at step 3 happened after
>the
>delete at step 2. (step 2 masks even future inserts)
>
>Now, the same sequence with a compaction thrown in the middle:
>
>   1.  ins  val1  with TS=6 set by client
>   2.  del  entire row at TS=6 set by client
>   3.  ---- table is compacted -----
>   4.  ins  val2  with TS=6  set by client
>   5.  read row
>
>The row returns val2.  (the delete at step2 got lost due to compaction).
>
>So we have different results depending upon whether an internal
>re-organization (like a compaction) happened or not. If we want both
>sequences to behave exactly the same, then we need to first choose what is
>the proper (and deterministic) behavior.
>
>A.  if we think that the first sequence is the correct one, then the
>delete
>at step 2 needs to be preserved forever.
>
>or,
>
>B. if we think that the second sequence is the correct behavior (ie, a
>read
>always produces the same results independent of compaction), then the
>record needs a second "internal TS" field to allow the RS to distinguish
>the real sequence of events, and not rely upon the TS field which is
>settable by the client.
>
>My opinion:
>
>We should do B.  It is normal for someone to write code that says  "if old
>exists, delete it;  add new". A subsequent read should always reliably
>return "new".
>
>The current way of relying on a client-settable TS field to determine
>causal order results in quirky behavior, and quirky is not good.
>
>
>
>> Look at these two examples:
>>
>> 1. insert Val1  at real time t1
>> 2. <del>  at real time t2 > t1
>> 3. insert  Val2 at real time  t3 > t2
>>
>> 1. insert Val1  with TS=1 at real time t1
>> 2. <del>  with TS = 2 at real time t2 > t1
>>
>> 3. insert  Val2 with TS = 3 at real time  t3 > t2
>>
>>
>> In both cases Val2 is visible.
>>
>> If the your code sets your own timestamps, you better know what you're
>> doing :)
>>
>> Note that my examples below are confusing even if you know how deletion
>>in
>> HBase works.
>> You have to look at Delete.java to figure out what is happening.
>> OK, since there were know objections in two days, I will commit my
>> proposed change in HBASE-5205.
>>
>>
>> -- Lars
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB