Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Put w/ timestamp -> Deleteall -> Put w/ timestamp fails


Copy link to this message
-
Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Harsh J 2012-08-15, 12:50
Yonghu,

You are correct at that. Until a major_compact finishes, inserting
with old timestamps will never show. Inserted old timestamped values
before a major compact but after a delete will all go away.

That is why I had to put in the data into the table _after_ the
major_compact ran, in that shell output I'd sent.

On Wed, Aug 15, 2012 at 5:18 PM, yonghu <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> I have a question of your description. The deleted tag masks the new
> inserted value with old timestamp, that's why the new inserted data
> can'be seen. But after major compaction, this new value will be seen
> again. So, the question is that how the deletion really executes. In
> my understanding, the deletion will delete all the data values which
> TSs are less equal than the TS of the deleted tag. So, if you insert a
> value with old TS after you insert a deleted tag, it should also be
> deleted at the  compaction time. For example, if I first insert
> (k1,t1), and then delete  (k1,t1) with deleted tag which TS is greater
> than t1, then reinsert (k1,t1) again. So, at the compaction time, two
> (k1,t1) should be deleted.
>
> wish your response!
>
> Yong
>
>
>
> On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <[EMAIL PROTECTED]> wrote:
>> Dear Harsh,
>>
>> Thank you very much for your detailed explanation. I could understand
>> what had been going on during my put/scan/delete operations. I'll modify
>> my application and test programs taking the timestamp implementation
>> into consideration.
>>
>> Best Regards,
>> Takahiko Kawasaki
>>
>> 2012/8/15 Harsh J <[EMAIL PROTECTED]>
>>
>>> When a Delete occurs, an insert is made with the timestamp being the
>>> current time (to indicate it is the latest version). Hence, when you
>>> insert a value after this with an _older_ timestamp, it is not taken
>>> in as the latest version, and is hence ignored when scanning. This is
>>> why you do not see the data.
>>>
>>> If you instead insert this after a compaction has fully run on this
>>> store file, then your value will indeed get shown after insert, cause
>>> at that moment there wouldn't exist such a row with a latest timestamp
>>> at all.
>>>
>>> hbase(main):060:0> flush 'test-table'
>>> 0 row(s) in 0.1020 seconds
>>>
>>> hbase(main):061:0> major_compact 'test-table'
>>> 0 row(s) in 0.0400 seconds
>>>
>>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
>>> 0 row(s) in 0.0230 seconds
>>>
>>> hbase(main):063:0> scan 'test-table'
>>> ROW                   COLUMN+CELL
>>>  row4                 column=test-family:, timestamp=10, value=value
>>> 1 row(s) in 0.0060 seconds
>>>
>>> I suppose this is why it is recommended not to mess with the
>>> timestamps manually, and instead just rely on versions.
>>>
>>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[EMAIL PROTECTED]>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a problem where 'put' with timestamp does not succeed.
>>> > I did the following at the HBase shell.
>>> >
>>> > (1) Do 'put' with timestamp.
>>> >       # 'scan' shows 1 row.
>>> >
>>> > (2) Delete the row by 'deleteall'.
>>> >       # 'scan' says "0 row(s)".
>>> >
>>> > (3) Do 'put' again by the same command line as (1).
>>> >       # 'scan' says "0 row(s)" ! Why?
>>> >
>>> > (4) Increment the timestamp value by 1 and try 'put' again.
>>> >       # 'scan' still says "0 row(s)"! Why?
>>> >
>>> > The command lines I actually typed are as follows and the attached
>>> > file is the output from the command lines.
>>> >
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row4'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10

Harsh J