Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Delete client API.


+
lars hofhansl 2012-01-16, 03:21
+
M. C. Srivas 2012-01-17, 16:13
+
lars hofhansl 2012-01-17, 18:07
+
M. C. Srivas 2012-01-17, 22:14
Copy link to this message
-
Re: Delete client API.
Karthik Ranganathan 2012-01-17, 23:27

@Srivas - totally agree that B is the correct thing to do.

One way we have talked about implementing this is using the memstore ts.
Every insert of a KV into the memstore is given a memstore-ts. These are
persisted only till they are needed (to ensure read atomicity for
scanners) and then that value is zeroed out on a subsequent compaction
(saves space).#16; If we retained the memstore-ts even beyond these
compactions, we could get a deterministic order for the puts and deletes
(first insert ts < del ts < second insert ts).

Thanks
Karthik
On 1/17/12 2:14 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote:

>On Tue, Jan 17, 2012 at 10:07 AM, lars hofhansl <[EMAIL PROTECTED]>
>wrote:
>
>> Yeah, it's confusing if one expects it to work like in a relational
>> database.
>> You can even do worse. If you by accident place a delete in the future
>>all
>> current inserts will be hidden until the next major compaction. :)
>> I got confused about this myself just recently (see my mail on the
>> dev-list).
>>
>>
>> In the end this is a pretty powerful feature and core to how HBase works
>> (not saying that is not confusing though).
>>
>>
>> If one keeps the following two points in mind it makes more sense:
>> 1. Delete just sets a tomb stone marker at a specific TS (marking
>> everything older as deleted).
>> 2. Everything is versioned, if no version is specified the current time
>> (at the regionserver) is used.
>>
>> In your example1 below t3 > 6, hence the insert is hidden.
>> In example2 both delete and insert TS are 6, hence the insert is hidden.
>>
>
>Lets consider my example2 for a little longer. Sequence of events
>
>   1.  ins  val1  with TS=6 set by client
>   2.  del  entire row at TS=6 set by client
>   3.  ins  val2  with TS=6  set by client
>   4.  read row
>
>The row returns nothing even though the insert at step 3 happened after
>the
>delete at step 2. (step 2 masks even future inserts)
>
>Now, the same sequence with a compaction thrown in the middle:
>
>   1.  ins  val1  with TS=6 set by client
>   2.  del  entire row at TS=6 set by client
>   3.  ---- table is compacted -----
>   4.  ins  val2  with TS=6  set by client
>   5.  read row
>
>The row returns val2.  (the delete at step2 got lost due to compaction).
>
>So we have different results depending upon whether an internal
>re-organization (like a compaction) happened or not. If we want both
>sequences to behave exactly the same, then we need to first choose what is
>the proper (and deterministic) behavior.
>
>A.  if we think that the first sequence is the correct one, then the
>delete
>at step 2 needs to be preserved forever.
>
>or,
>
>B. if we think that the second sequence is the correct behavior (ie, a
>read
>always produces the same results independent of compaction), then the
>record needs a second "internal TS" field to allow the RS to distinguish
>the real sequence of events, and not rely upon the TS field which is
>settable by the client.
>
>My opinion:
>
>We should do B.  It is normal for someone to write code that says  "if old
>exists, delete it;  add new". A subsequent read should always reliably
>return "new".
>
>The current way of relying on a client-settable TS field to determine
>causal order results in quirky behavior, and quirky is not good.
>
>
>
>> Look at these two examples:
>>
>> 1. insert Val1  at real time t1
>> 2. <del>  at real time t2 > t1
>> 3. insert  Val2 at real time  t3 > t2
>>
>> 1. insert Val1  with TS=1 at real time t1
>> 2. <del>  with TS = 2 at real time t2 > t1
>>
>> 3. insert  Val2 with TS = 3 at real time  t3 > t2
>>
>>
>> In both cases Val2 is visible.
>>
>> If the your code sets your own timestamps, you better know what you're
>> doing :)
>>
>> Note that my examples below are confusing even if you know how deletion
>>in
>> HBase works.
>> You have to look at Delete.java to figure out what is happening.
>> OK, since there were know objections in two days, I will commit my
>> proposed change in HBASE-5205.
>>
>>
>> -- Lars
>
+
lars hofhansl 2012-01-18, 04:56
+
M. C. Srivas 2012-01-18, 09:51
+
Mikael Sitruk 2012-01-18, 10:51
+
Ian Varley 2012-01-18, 14:38
+
Karthik Ranganathan 2012-01-18, 20:04
+
Mikael Sitruk 2012-01-18, 20:34
+
M. C. Srivas 2012-01-18, 21:56