|
|
-
Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Takahiko Kawasaki 2012-08-14, 14:54
Hello,
I have a problem where 'put' with timestamp does not succeed. I did the following at the HBase shell.
(1) Do 'put' with timestamp. # 'scan' shows 1 row.
(2) Delete the row by 'deleteall'. # 'scan' says "0 row(s)".
(3) Do 'put' again by the same command line as (1). # 'scan' says "0 row(s)" ! Why?
(4) Increment the timestamp value by 1 and try 'put' again. # 'scan' still says "0 row(s)"! Why?
The command lines I actually typed are as follows and the attached file is the output from the command lines.
scan 'test-table' put 'test-table', 'row3', 'test-family', 'value' scan 'test-table' deleteall 'test-table', 'row3' scan 'test-table' put 'test-table', 'row3', 'test-family', 'value' scan 'test-table' deleteall 'test-table', 'row3' scan 'test-table' put 'test-table', 'row4', 'test-family', 'value', 10 scan 'test-table' deleteall 'test-table', 'row4' scan 'test-table' put 'test-table', 'row4', 'test-family', 'value', 10 scan 'test-table' put 'test-table', 'row4', 'test-family', 'value', 10 scan 'test-table' quit
Is this behavior the HBase specification?
My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
Could anyone give me any insight, please?
Best Regards, Takahiko Kawasaki
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Harsh J 2012-08-14, 15:46
When a Delete occurs, an insert is made with the timestamp being the current time (to indicate it is the latest version). Hence, when you insert a value after this with an _older_ timestamp, it is not taken in as the latest version, and is hence ignored when scanning. This is why you do not see the data.
If you instead insert this after a compaction has fully run on this store file, then your value will indeed get shown after insert, cause at that moment there wouldn't exist such a row with a latest timestamp at all.
hbase(main):060:0> flush 'test-table' 0 row(s) in 0.1020 seconds
hbase(main):061:0> major_compact 'test-table' 0 row(s) in 0.0400 seconds
hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10 0 row(s) in 0.0230 seconds
hbase(main):063:0> scan 'test-table' ROW COLUMN+CELL row4 column=test-family:, timestamp=10, value=value 1 row(s) in 0.0060 seconds
I suppose this is why it is recommended not to mess with the timestamps manually, and instead just rely on versions.
On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote: > Hello, > > I have a problem where 'put' with timestamp does not succeed. > I did the following at the HBase shell. > > (1) Do 'put' with timestamp. > # 'scan' shows 1 row. > > (2) Delete the row by 'deleteall'. > # 'scan' says "0 row(s)". > > (3) Do 'put' again by the same command line as (1). > # 'scan' says "0 row(s)" ! Why? > > (4) Increment the timestamp value by 1 and try 'put' again. > # 'scan' still says "0 row(s)"! Why? > > The command lines I actually typed are as follows and the attached > file is the output from the command lines. > > scan 'test-table' > put 'test-table', 'row3', 'test-family', 'value' > scan 'test-table' > deleteall 'test-table', 'row3' > scan 'test-table' > put 'test-table', 'row3', 'test-family', 'value' > scan 'test-table' > deleteall 'test-table', 'row3' > scan 'test-table' > put 'test-table', 'row4', 'test-family', 'value', 10 > scan 'test-table' > deleteall 'test-table', 'row4' > scan 'test-table' > put 'test-table', 'row4', 'test-family', 'value', 10 > scan 'test-table' > put 'test-table', 'row4', 'test-family', 'value', 10 > scan 'test-table' > quit > > Is this behavior the HBase specification? > > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0. > > Could anyone give me any insight, please? > > Best Regards, > Takahiko Kawasaki
-- Harsh J
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Takahiko Kawasaki 2012-08-15, 05:53
Dear Harsh,
Thank you very much for your detailed explanation. I could understand what had been going on during my put/scan/delete operations. I'll modify my application and test programs taking the timestamp implementation into consideration.
Best Regards, Takahiko Kawasaki
2012/8/15 Harsh J <[EMAIL PROTECTED]>
> When a Delete occurs, an insert is made with the timestamp being the > current time (to indicate it is the latest version). Hence, when you > insert a value after this with an _older_ timestamp, it is not taken > in as the latest version, and is hence ignored when scanning. This is > why you do not see the data. > > If you instead insert this after a compaction has fully run on this > store file, then your value will indeed get shown after insert, cause > at that moment there wouldn't exist such a row with a latest timestamp > at all. > > hbase(main):060:0> flush 'test-table' > 0 row(s) in 0.1020 seconds > > hbase(main):061:0> major_compact 'test-table' > 0 row(s) in 0.0400 seconds > > hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10 > 0 row(s) in 0.0230 seconds > > hbase(main):063:0> scan 'test-table' > ROW COLUMN+CELL > row4 column=test-family:, timestamp=10, value=value > 1 row(s) in 0.0060 seconds > > I suppose this is why it is recommended not to mess with the > timestamps manually, and instead just rely on versions. > > On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> > wrote: > > Hello, > > > > I have a problem where 'put' with timestamp does not succeed. > > I did the following at the HBase shell. > > > > (1) Do 'put' with timestamp. > > # 'scan' shows 1 row. > > > > (2) Delete the row by 'deleteall'. > > # 'scan' says "0 row(s)". > > > > (3) Do 'put' again by the same command line as (1). > > # 'scan' says "0 row(s)" ! Why? > > > > (4) Increment the timestamp value by 1 and try 'put' again. > > # 'scan' still says "0 row(s)"! Why? > > > > The command lines I actually typed are as follows and the attached > > file is the output from the command lines. > > > > scan 'test-table' > > put 'test-table', 'row3', 'test-family', 'value' > > scan 'test-table' > > deleteall 'test-table', 'row3' > > scan 'test-table' > > put 'test-table', 'row3', 'test-family', 'value' > > scan 'test-table' > > deleteall 'test-table', 'row3' > > scan 'test-table' > > put 'test-table', 'row4', 'test-family', 'value', 10 > > scan 'test-table' > > deleteall 'test-table', 'row4' > > scan 'test-table' > > put 'test-table', 'row4', 'test-family', 'value', 10 > > scan 'test-table' > > put 'test-table', 'row4', 'test-family', 'value', 10 > > scan 'test-table' > > quit > > > > Is this behavior the HBase specification? > > > > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0. > > > > Could anyone give me any insight, please? > > > > Best Regards, > > Takahiko Kawasaki > > > > -- > Harsh J >
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
yonghu 2012-08-15, 11:48
Hi Harsh,
I have a question of your description. The deleted tag masks the new inserted value with old timestamp, that's why the new inserted data can'be seen. But after major compaction, this new value will be seen again. So, the question is that how the deletion really executes. In my understanding, the deletion will delete all the data values which TSs are less equal than the TS of the deleted tag. So, if you insert a value with old TS after you insert a deleted tag, it should also be deleted at the compaction time. For example, if I first insert (k1,t1), and then delete (k1,t1) with deleted tag which TS is greater than t1, then reinsert (k1,t1) again. So, at the compaction time, two (k1,t1) should be deleted.
wish your response!
Yong
On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote: > Dear Harsh, > > Thank you very much for your detailed explanation. I could understand > what had been going on during my put/scan/delete operations. I'll modify > my application and test programs taking the timestamp implementation > into consideration. > > Best Regards, > Takahiko Kawasaki > > 2012/8/15 Harsh J <[EMAIL PROTECTED]> > >> When a Delete occurs, an insert is made with the timestamp being the >> current time (to indicate it is the latest version). Hence, when you >> insert a value after this with an _older_ timestamp, it is not taken >> in as the latest version, and is hence ignored when scanning. This is >> why you do not see the data. >> >> If you instead insert this after a compaction has fully run on this >> store file, then your value will indeed get shown after insert, cause >> at that moment there wouldn't exist such a row with a latest timestamp >> at all. >> >> hbase(main):060:0> flush 'test-table' >> 0 row(s) in 0.1020 seconds >> >> hbase(main):061:0> major_compact 'test-table' >> 0 row(s) in 0.0400 seconds >> >> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10 >> 0 row(s) in 0.0230 seconds >> >> hbase(main):063:0> scan 'test-table' >> ROW COLUMN+CELL >> row4 column=test-family:, timestamp=10, value=value >> 1 row(s) in 0.0060 seconds >> >> I suppose this is why it is recommended not to mess with the >> timestamps manually, and instead just rely on versions. >> >> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> >> wrote: >> > Hello, >> > >> > I have a problem where 'put' with timestamp does not succeed. >> > I did the following at the HBase shell. >> > >> > (1) Do 'put' with timestamp. >> > # 'scan' shows 1 row. >> > >> > (2) Delete the row by 'deleteall'. >> > # 'scan' says "0 row(s)". >> > >> > (3) Do 'put' again by the same command line as (1). >> > # 'scan' says "0 row(s)" ! Why? >> > >> > (4) Increment the timestamp value by 1 and try 'put' again. >> > # 'scan' still says "0 row(s)"! Why? >> > >> > The command lines I actually typed are as follows and the attached >> > file is the output from the command lines. >> > >> > scan 'test-table' >> > put 'test-table', 'row3', 'test-family', 'value' >> > scan 'test-table' >> > deleteall 'test-table', 'row3' >> > scan 'test-table' >> > put 'test-table', 'row3', 'test-family', 'value' >> > scan 'test-table' >> > deleteall 'test-table', 'row3' >> > scan 'test-table' >> > put 'test-table', 'row4', 'test-family', 'value', 10 >> > scan 'test-table' >> > deleteall 'test-table', 'row4' >> > scan 'test-table' >> > put 'test-table', 'row4', 'test-family', 'value', 10 >> > scan 'test-table' >> > put 'test-table', 'row4', 'test-family', 'value', 10 >> > scan 'test-table' >> > quit >> > >> > Is this behavior the HBase specification? >> > >> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0. >> > >> > Could anyone give me any insight, please? >> > >> > Best Regards, >> > Takahiko Kawasaki >> >> >> >> -- >> Harsh J >>
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Harsh J 2012-08-15, 12:50
Yonghu,
You are correct at that. Until a major_compact finishes, inserting with old timestamps will never show. Inserted old timestamped values before a major compact but after a delete will all go away.
That is why I had to put in the data into the table _after_ the major_compact ran, in that shell output I'd sent.
On Wed, Aug 15, 2012 at 5:18 PM, yonghu <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > I have a question of your description. The deleted tag masks the new > inserted value with old timestamp, that's why the new inserted data > can'be seen. But after major compaction, this new value will be seen > again. So, the question is that how the deletion really executes. In > my understanding, the deletion will delete all the data values which > TSs are less equal than the TS of the deleted tag. So, if you insert a > value with old TS after you insert a deleted tag, it should also be > deleted at the compaction time. For example, if I first insert > (k1,t1), and then delete (k1,t1) with deleted tag which TS is greater > than t1, then reinsert (k1,t1) again. So, at the compaction time, two > (k1,t1) should be deleted. > > wish your response! > > Yong > > > > On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote: >> Dear Harsh, >> >> Thank you very much for your detailed explanation. I could understand >> what had been going on during my put/scan/delete operations. I'll modify >> my application and test programs taking the timestamp implementation >> into consideration. >> >> Best Regards, >> Takahiko Kawasaki >> >> 2012/8/15 Harsh J <[EMAIL PROTECTED]> >> >>> When a Delete occurs, an insert is made with the timestamp being the >>> current time (to indicate it is the latest version). Hence, when you >>> insert a value after this with an _older_ timestamp, it is not taken >>> in as the latest version, and is hence ignored when scanning. This is >>> why you do not see the data. >>> >>> If you instead insert this after a compaction has fully run on this >>> store file, then your value will indeed get shown after insert, cause >>> at that moment there wouldn't exist such a row with a latest timestamp >>> at all. >>> >>> hbase(main):060:0> flush 'test-table' >>> 0 row(s) in 0.1020 seconds >>> >>> hbase(main):061:0> major_compact 'test-table' >>> 0 row(s) in 0.0400 seconds >>> >>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10 >>> 0 row(s) in 0.0230 seconds >>> >>> hbase(main):063:0> scan 'test-table' >>> ROW COLUMN+CELL >>> row4 column=test-family:, timestamp=10, value=value >>> 1 row(s) in 0.0060 seconds >>> >>> I suppose this is why it is recommended not to mess with the >>> timestamps manually, and instead just rely on versions. >>> >>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> >>> wrote: >>> > Hello, >>> > >>> > I have a problem where 'put' with timestamp does not succeed. >>> > I did the following at the HBase shell. >>> > >>> > (1) Do 'put' with timestamp. >>> > # 'scan' shows 1 row. >>> > >>> > (2) Delete the row by 'deleteall'. >>> > # 'scan' says "0 row(s)". >>> > >>> > (3) Do 'put' again by the same command line as (1). >>> > # 'scan' says "0 row(s)" ! Why? >>> > >>> > (4) Increment the timestamp value by 1 and try 'put' again. >>> > # 'scan' still says "0 row(s)"! Why? >>> > >>> > The command lines I actually typed are as follows and the attached >>> > file is the output from the command lines. >>> > >>> > scan 'test-table' >>> > put 'test-table', 'row3', 'test-family', 'value' >>> > scan 'test-table' >>> > deleteall 'test-table', 'row3' >>> > scan 'test-table' >>> > put 'test-table', 'row3', 'test-family', 'value' >>> > scan 'test-table' >>> > deleteall 'test-table', 'row3' >>> > scan 'test-table' >>> > put 'test-table', 'row4', 'test-family', 'value', 10 >>> > scan 'test-table' >>> > deleteall 'test-table', 'row4' >>> > scan 'test-table' >>> > put 'test-table', 'row4', 'test-family', 'value', 10
Harsh J
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
lars hofhansl 2012-08-15, 16:13
I also have a short blog post about this here: http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html________________________________ From: Harsh J <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wednesday, August 15, 2012 5:50 AM Subject: Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails Yonghu, You are correct at that. Until a major_compact finishes, inserting with old timestamps will never show. Inserted old timestamped values before a major compact but after a delete will all go away. That is why I had to put in the data into the table _after_ the major_compact ran, in that shell output I'd sent. On Wed, Aug 15, 2012 at 5:18 PM, yonghu <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > I have a question of your description. The deleted tag masks the new > inserted value with old timestamp, that's why the new inserted data > can'be seen. But after major compaction, this new value will be seen > again. So, the question is that how the deletion really executes. In > my understanding, the deletion will delete all the data values which > TSs are less equal than the TS of the deleted tag. So, if you insert a > value with old TS after you insert a deleted tag, it should also be > deleted at the compaction time. For example, if I first insert > (k1,t1), and then delete (k1,t1) with deleted tag which TS is greater > than t1, then reinsert (k1,t1) again. So, at the compaction time, two > (k1,t1) should be deleted. > > wish your response! > > Yong > > > > On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote: >> Dear Harsh, >> >> Thank you very much for your detailed explanation. I could understand >> what had been going on during my put/scan/delete operations. I'll modify >> my application and test programs taking the timestamp implementation >> into consideration. >> >> Best Regards, >> Takahiko Kawasaki >> >> 2012/8/15 Harsh J <[EMAIL PROTECTED]> >> >>> When a Delete occurs, an insert is made with the timestamp being the >>> current time (to indicate it is the latest version). Hence, when you >>> insert a value after this with an _older_ timestamp, it is not taken >>> in as the latest version, and is hence ignored when scanning. This is >>> why you do not see the data. >>> >>> If you instead insert this after a compaction has fully run on this >>> store file, then your value will indeed get shown after insert, cause >>> at that moment there wouldn't exist such a row with a latest timestamp >>> at all. >>> >>> hbase(main):060:0> flush 'test-table' >>> 0 row(s) in 0.1020 seconds >>> >>> hbase(main):061:0> major_compact 'test-table' >>> 0 row(s) in 0.0400 seconds >>> >>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10 >>> 0 row(s) in 0.0230 seconds >>> >>> hbase(main):063:0> scan 'test-table' >>> ROW COLUMN+CELL >>> row4 column=test-family:, timestamp=10, value=value >>> 1 row(s) in 0.0060 seconds >>> >>> I suppose this is why it is recommended not to mess with the >>> timestamps manually, and instead just rely on versions. >>> >>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[EMAIL PROTECTED]> >>> wrote: >>> > Hello, >>> > >>> > I have a problem where 'put' with timestamp does not succeed. >>> > I did the following at the HBase shell. >>> > >>> > (1) Do 'put' with timestamp. >>> > # 'scan' shows 1 row. >>> > >>> > (2) Delete the row by 'deleteall'. >>> > # 'scan' says "0 row(s)". >>> > >>> > (3) Do 'put' again by the same command line as (1). >>> > # 'scan' says "0 row(s)" ! Why? >>> > >>> > (4) Increment the timestamp value by 1 and try 'put' again. >>> > # 'scan' still says "0 row(s)"! Why? >>> > >>> > The command lines I actually typed are as follows and the attached >>> > file is the output from the command lines. >>> > >>> > scan 'test-table' >>> > put 'test-table', 'row3', 'test-family', 'value' >>> > scan 'test-table' >>> > deleteall 'test-table', 'row3' >>> > scan 'test-table' >>> > put 'test-table', 'row3', 'test-family', 'value' Harsh J
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Stack 2012-08-15, 20:46
On Wed, Aug 15, 2012 at 9:13 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > I also have a short blog post about this here: http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html> I added link to this discussion into the Versioning section of our reference guide (thanks all above). St.Ack
|
|