|
Wei Tan
2012-08-02, 22:35
lars hofhansl
2012-08-02, 23:33
Wei Tan
2012-08-03, 03:02
Ted Yu
2012-08-03, 03:31
Wei Tan
2012-08-03, 14:21
lars hofhansl
2012-08-03, 19:11
Wei Tan
2012-08-03, 19:43
lars hofhansl
2012-08-03, 20:05
Vladimir Rodionov
2012-08-03, 21:27
lars hofhansl
2012-08-03, 21:44
Wei Tan
2012-08-03, 21:58
Wei Tan
2012-08-06, 14:25
|
-
memstore timestamp and visible timestampWei Tan 2012-08-02, 22:35
Hi,
I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752
-
Re: memstore timestamp and visible timestamplars hofhansl 2012-08-02, 23:33
Hi Wei,
you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. -- Lars ----- Original Message ----- From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752
-
memstore timestamp and visible timestampWei Tan 2012-08-03, 03:02
Hi,
I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752
-
Re: memstore timestamp and visible timestampTed Yu 2012-08-03, 03:31
Lars H replied to you this afternoon.
Please check his reply. On Thu, Aug 2, 2012 at 8:02 PM, Wei Tan <[EMAIL PROTECTED]> wrote: > Hi, > > I have a question regarding the correlation between the visible > timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the > write number, denoted as memts). Reading the HRegion.java code it seems > that these two are independently assigned. Let's assume two concurrent > put: (k, v1) and (k, v2) > > > Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed > and visible before (k,v2). > If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the > latest version. > else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV > commits, it immediately become stale and still not visible. --- Is it a > desirable feature? > > > Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does > not indicate that ts(k,v1) < ts(k, v2), and vice versa? > PS: let's talk about the hbase region server assigned, not user assigned, > visible timestamp. > > Thanks, > > Wei > > > Wei Tan > Research Staff Member > IBM T. J. Watson Research Center > 19 Skyline Dr, Hawthorne, NY 10532 > [EMAIL PROTECTED]; 914-784-6752
-
Re: memstore timestamp and visible timestampWei Tan 2012-08-03, 14:21
Hi Lars,
Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. -- Lars ----- Original Message ----- From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752
-
Re: memstore timestamp and visible timestamplars hofhansl 2012-08-03, 19:11
I see. This is not as much a stated guarantee but a fact following from the implementation.
The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). -- Lars ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. -- Lars ----- Original Message ----- From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752
-
Re: memstore timestamp and visible timestampWei Tan 2012-08-03, 19:43
Hi Lars,
"Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS." I assume that this applies ONLY when we talk about two KVs in the SAME row? I read the code of put() finding that a row is locked entering a put, and then TS assigned, and later memTS assigned. This makes sense since only after this put is done can another put obtain the row lock, and therefore a larger TS and memTS will be obtained. However, this does NOT hold for two KVs who belong to different rows, right? Say we have two KVs, KV1 can enter the put earlier and get a smaller TS1, but it can be delayed a little bit in the code path, and possibly get a memTS after KV2, correct? Again, thanks :-) Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: 08/03/2012 03:14 PM Subject: Re: memstore timestamp and visible timestamp I see. This is not as much a stated guarantee but a fact following from the implementation. The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). -- Lars ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752
-
Re: memstore timestamp and visible timestamplars hofhansl 2012-08-03, 20:05
" I assume that this applies ONLY when we talk about two KVs in the SAME
row?" Possibly... Would need to look at the code a bit closer. Since HBase only makes ACID inside a row it should not matter. (Well that is except for the work I did in HBASE-5229). -- Lars ----- Original Message ----- From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 12:43 PM Subject: Re: memstore timestamp and visible timestamp Hi Lars, "Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS." I assume that this applies ONLY when we talk about two KVs in the SAME row? I read the code of put() finding that a row is locked entering a put, and then TS assigned, and later memTS assigned. This makes sense since only after this put is done can another put obtain the row lock, and therefore a larger TS and memTS will be obtained. However, this does NOT hold for two KVs who belong to different rows, right? Say we have two KVs, KV1 can enter the put earlier and get a smaller TS1, but it can be delayed a little bit in the code path, and possibly get a memTS after KV2, correct? Again, thanks :-) Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: 08/03/2012 03:14 PM Subject: Re: memstore timestamp and visible timestamp I see. This is not as much a stated guarantee but a fact following from the implementation. The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). -- Lars ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Resear
-
RE: memstore timestamp and visible timestampVladimir Rodionov 2012-08-03, 21:27
Time can go backwards. One time a year. By one hour.
I may be wrong but it seems that the situation described by TS (memTS1 > memTS2 and ts1 < ts2) is possible but under concurrent updates in a distributed environment the only way to guarantee "fairness" of operations is to put all of them into one global queue. I really doubt that this is what people need (and want). Upd: It is possible to keep a queue per server-row inside RS. This is the question of how do we define order of requests in concurrent environment. We can have one global queue, one queue per RS or (at the lowest granularity) one queue per key-row but the most efficient way (and of course not the most fair) - add the element of randomness - let OS decide which thread it will give time slot to first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com ________________________________________ From: lars hofhansl [[EMAIL PROTECTED]] Sent: Friday, August 03, 2012 12:11 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: memstore timestamp and visible timestamp I see. This is not as much a stated guarantee but a fact following from the implementation. The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). -- Lars ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visible timestamp. Thanks, Wei Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED]
-
Re: memstore timestamp and visible timestamplars hofhansl 2012-08-03, 21:44
We are also mixing concepts here.
memTS is a regionserver local concept, there is no distributed aspect to this. The whole memTS vs TS discussion is somewhat pointless as memTS is an internal concept and TS is part of the client visible data. There can be a single Put adding many columns with the same rowkey, all with different TSs and all these changes are still only visible atomically, which is handled by the memTS. -- Lars ----- Original Message ----- From: Vladimir Rodionov <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 2:27 PM Subject: RE: memstore timestamp and visible timestamp Time can go backwards. One time a year. By one hour. I may be wrong but it seems that the situation described by TS (memTS1 > memTS2 and ts1 < ts2) is possible but under concurrent updates in a distributed environment the only way to guarantee "fairness" of operations is to put all of them into one global queue. I really doubt that this is what people need (and want). Upd: It is possible to keep a queue per server-row inside RS. This is the question of how do we define order of requests in concurrent environment. We can have one global queue, one queue per RS or (at the lowest granularity) one queue per key-row but the most efficient way (and of course not the most fair) - add the element of randomness - let OS decide which thread it will give time slot to first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com ________________________________________ From: lars hofhansl [[EMAIL PROTECTED]] Sent: Friday, August 03, 2012 12:11 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: memstore timestamp and visible timestamp I see. This is not as much a stated guarantee but a fact following from the implementation. The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). -- Lars ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Thursday, August 2, 2012 3:35 PM Subject: memstore timestamp and visible timestamp Hi, I have a question regarding the correlation between the visible timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the write number, denoted as memts). Reading the HRegion.java code it seems that these two are independently assigned. Let's assume two concurrent put: (k, v1) and (k, v2) Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed and visible before (k,v2). If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the latest version. else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV commits, it immediately become stale and still not visible. --- Is it a desirable feature? Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does not indicate that ts(k,v1) < ts(k, v2), and vice versa? PS: let's talk about the hbase region server assigned, not user assigned, visibl
-
Re: memstore timestamp and visible timestampWei Tan 2012-08-03, 21:58
Hi Lars,
I agree with you on that, this comparison only makes sense when we have two concurrent put to the same key, i.e., (k, v1) and (k, v2), AND we ask the region server to assign their timestamp. The relations of ts(kv1) and ts (kv2) can be non-deterministic, which is fine. A desired feature, which seems to be already satisfied in the current implementation, is that if ts(kv1) < ts(kv2), it probably should be visible earlier as the latest value (i.e., memts(kv1)<memts(kv2)), otherwise it will never be! Thank you Lars! Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: 08/03/2012 05:45 PM Subject: Re: memstore timestamp and visible timestamp We are also mixing concepts here. memTS is a regionserver local concept, there is no distributed aspect to this. The whole memTS vs TS discussion is somewhat pointless as memTS is an internal concept and TS is part of the client visible data. There can be a single Put adding many columns with the same rowkey, all with different TSs and all these changes are still only visible atomically, which is handled by the memTS. -- Lars ----- Original Message ----- From: Vladimir Rodionov <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 2:27 PM Subject: RE: memstore timestamp and visible timestamp Time can go backwards. One time a year. By one hour. I may be wrong but it seems that the situation described by TS (memTS1 > memTS2 and ts1 < ts2) is possible but under concurrent updates in a distributed environment the only way to guarantee "fairness" of operations is to put all of them into one global queue. I really doubt that this is what people need (and want). Upd: It is possible to keep a queue per server-row inside RS. This is the question of how do we define order of requests in concurrent environment. We can have one global queue, one queue per RS or (at the lowest granularity) one queue per key-row but the most efficient way (and of course not the most fair) - add the element of randomness - let OS decide which thread it will give time slot to first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com ________________________________________ From: lars hofhansl [[EMAIL PROTECTED]] Sent: Friday, August 03, 2012 12:11 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: memstore timestamp and visible timestamp I see. This is not as much a stated guarantee but a fact following from the implementation. The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). -- Lars ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determined by what the a client wants to see based on the application visible timestamp (TS). The visibility to concurrent scanners is controlled by the memstoreTS (mTS) to avoid "strange" states sue to parallel updates. HBase here guards against partially visible "transactions" (i.e. a Put of many columns that fails after it applied the changes to some of the columns). The scenario you describe below is indeed desired. Note that a client can request seeing the older versions too so the older edit (in terms of TS is not lost). Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). If you do not mind a longer read, I have written about this here: http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html Let me know if that makes any sense. From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]
-
Re: memstore timestamp and visible timestampWei Tan 2012-08-06, 14:25
Hi,
A follow up question: When using region assigned ts, are these two assumptions true? 1. it is impossible that two puts on the same row will have the same ts, regardless of whether or not autoflush is enabled. 2. preput internalput and postput are all inside put, which in turn is guarded by a row lock. Therefore, the whole process of a row and its associated coprocessors, will NOT interleave in timeline. These semantics are very important in my implementation of a coprocessor. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: Wei Tan/Watson/IBM@IBMUS To: [EMAIL PROTECTED], Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: 08/03/2012 05:59 PM Subject: Re: memstore timestamp and visible timestamp Hi Lars, I agree with you on that, this comparison only makes sense when we have two concurrent put to the same key, i.e., (k, v1) and (k, v2), AND we ask the region server to assign their timestamp. The relations of ts(kv1) and ts (kv2) can be non-deterministic, which is fine. A desired feature, which seems to be already satisfied in the current implementation, is that if ts(kv1) < ts(kv2), it probably should be visible earlier as the latest value (i.e., memts(kv1)<memts(kv2)), otherwise it will never be! Thank you Lars! Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: 08/03/2012 05:45 PM Subject: Re: memstore timestamp and visible timestamp We are also mixing concepts here. memTS is a regionserver local concept, there is no distributed aspect to this. The whole memTS vs TS discussion is somewhat pointless as memTS is an internal concept and TS is part of the client visible data. There can be a single Put adding many columns with the same rowkey, all with different TSs and all these changes are still only visible atomically, which is handled by the memTS. -- Lars ----- Original Message ----- From: Vladimir Rodionov <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 2:27 PM Subject: RE: memstore timestamp and visible timestamp Time can go backwards. One time a year. By one hour. I may be wrong but it seems that the situation described by TS (memTS1 > memTS2 and ts1 < ts2) is possible but under concurrent updates in a distributed environment the only way to guarantee "fairness" of operations is to put all of them into one global queue. I really doubt that this is what people need (and want). Upd: It is possible to keep a queue per server-row inside RS. This is the question of how do we define order of requests in concurrent environment. We can have one global queue, one queue per RS or (at the lowest granularity) one queue per key-row but the most efficient way (and of course not the most fair) - add the element of randomness - let OS decide which thread it will give time slot to first. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com ________________________________________ From: lars hofhansl [[EMAIL PROTECTED]] Sent: Friday, August 03, 2012 12:11 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: memstore timestamp and visible timestamp I see. This is not as much a stated guarantee but a fact following from the implementation. The memTS is handed out per region server - which is fine, because the only consistency guarantee HBase makes is for KVs of the same row, and these are always colocated in the same region (and hence the same region server). Since the region server also hands out the TSs based on wall clock time (and assuming time does not go backwards) it follows that a KV assigned a later memTS cannot have an earlier TS. Of course that is not the case if you use client assigned TSs. Maybe I should write a followup blog post that more clearly describes the relationship (or rather the absence thereof) between the memTS and the TS. The gist is that the memTS is strictly internal to guarantee ACID properties (and HBase could have used readlocks for this as well, and if it did that would be transparent to the outside), whereas the TS is an application level concept, it is part of the data (so to speak). ________________________________ From: Wei Tan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, August 3, 2012 7:21 AM Subject: Re: memstore timestamp and visible timestamp Hi Lars, Appreciate your reply. Actually I read your blog posting and then had that question. I am very interested in how you guarantee this: Also note that if you use the Region Server assigned TSs then mTS1<mTS2 implies TS1<=TS2 (the update might happen with the same ms). In case you have a pointer explaining this, I would like to read. Otherwise I will dig into the code later today. I remember reading 0.92.0 code and do not find much clue. But I will try again. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 [EMAIL PROTECTED]; 914-784-6752 From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Date: 08/02/2012 07:35 PM Subject: Re: memstore timestamp and visible timestamp Hi Wei, you have to distinguish between "visible to other concurrent scanners" and "visible to a client". What's visible to a client is determi |