Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Why does a delete behave like this?


+
Niels Basjes 2013-12-09, 08:47
+
Stack 2013-12-09, 21:30
+
Stack 2013-12-09, 23:27
+
Ted Yu 2013-12-09, 17:55
+
lars hofhansl 2013-12-09, 22:53
+
Ted Yu 2013-12-10, 04:16
Copy link to this message
-
Re: Why does a delete behave like this?
https://issues.apache.org/jira/browse/HBASE-9005  :)
Just have to do it now.

________________________________
 From: Ted Yu <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Monday, December 9, 2013 8:16 PM
Subject: Re: Why does a delete behave like this?
 
I ran the following shell command to create the table:

hbase(main):001:0> create 't1', {NAME => 'cf', KEEP_DELETED_CELLS => true}
The second get command returns the same result as the first.

Lars:
The refguide doesn't cover such usage. Do you think we should document it ?

Cheers

On Mon, Dec 9, 2013 at 2:53 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

This is because by default a delete marker extends all the way back time.
>When you set KEEP_DELETED_CELLS for your column family this behavior is fixed. I.e. you get correct timerange query behavior even w.r.t. to deletes.
>
>
>-- Lars
>
>
>
>________________________________
> From: Niels Basjes <[EMAIL PROTECTED]>
>To: user <[EMAIL PROTECTED]>
>Sent: Monday, December 9, 2013 12:47 AM
>Subject: Why does a delete behave like this?
>
>
>
>Hi,
>
>When I first started learning about HBase I compared the logic of setting
>new values to something that is similar to the way a tool like Subversion
>works: When you set a new value you don't overwrite the old one, you simply
>create a new version.
>Just like subversion you can then at a later moment retrieve the old value
>that way the situation at an earlier date.
>
>(The only real variation to the SVN model is that HBase only retains the
>last N versions of a cell.)
>
>There is however one situation where this comparison really fails: When you
>do a delete on a cell.
>If you want to retrieve the state of a thing from subversion and in the
>current version this thing has been deleted then you can still get it back.
>With HBase however if you delete a cell you place a tombstone at a specific
>time and as such internally the older values are still present.
>
>But when you try to retrieve such an older value then you still get an
>empty result back (i.e. no such cell).
>The direct consequence of the currently implemented model is that an
>application can never retrieve the correct state of a row at an older
>timestamp if a delete on any cell has occurred.
>
>Example:
>
>I create a table with one row:
>
>> create 't1', 'cf'
>> put 't1', 'rowid', 'cf:1', 'One', 1000
>> put 't1', 'rowid', 'cf:2', 'Two', 2000
>> put 't1', 'rowid', 'cf:3', 'Three', 3000
>> get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>    COLUMN                     CELL
>     cf:1                      timestamp=1000, value=One
>     cf:2                      timestamp=2000, value=Two
>     cf:3                      timestamp=3000, value=Three
>    3 row(s) in 0.0150 seconds
>
>Then the delete of a cell at a later timestamp:
>
>> delete 't1', 'rowid', 'cf:1', 4000
>
>Now if I retrieve the row at time 3500 I would find it logical that I would
>still see the same values as I would above.
>This is however the reality:
>
>> get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>    COLUMN                     CELL
>     cf:2                      timestamp=2000, value=Two
>     cf:3                      timestamp=3000, value=Three
>    2 row(s) in 0.0120 seconds
>
>
>Why has it been designed/implemented like this?
>What is the logic behind this model?
>
>--
>Best regards / Met vriendelijke groeten,
>
>Niels Basjes
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB