Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> delete operation with timestamp


Copy link to this message
-
Re: delete operation with timestamp
Slightly offtopic, sorry.

While we have attention on timestamps may I ask why HBase maintains a
timestamp at row level (initialized with LATEST_TIMESTAMP)?
In other words timestamp has meaning in context of a cell and HBase
keeps it at that level, then why keep one TS at row level. Going
further, what is the meaning of
a timestamp 'ts' associated with Put object if all the KeyValue
objects associated have timestamp different than 'ts'.

Was the motivation behind this, to allow client not specify timestamp
(in turn assume they meant latest ts)?

I am looking at line 5 of this function http://pastebin.com/ik1Dxgqq
which is serializing timestamp at row level and at lines 18-21 which
are serializing timestamp at cell level.

Thanks.
On Mon, Nov 28, 2011 at 3:56 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> Hi Yi,
> the reason is that nothing is ever changed in-place in HBase, only new files are created (with the exception of the WAL, which is appended to,
> and some special scenario like atomic increment and atomic appends, where older version of the cells are removed from the memstore).
>
> That caters very well to the performance characteristics of the underlying distributed file system (HDFS).
>
>
> Consequently deleted rows are not actually deleted right away, we just record the fact the rows should not be visible anymore and can eventually be removed.
> The actual removal happens during the next compaction when new files are created.
>
> Sometimes that does lead to unexpected behaviors such as the one you describe below.
>
> In the trunk version of HBase I introduced the possibility to perform time-range queries that can "peek" behind delete markers to retrieve cells that are marked as deleted. (HBASE-4536)
>
> -- Lars
>
>
> ----- Original Message -----
> From: Yi Liang <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Thursday, November 24, 2011 10:11 PM
> Subject: Re: delete operation with timestamp
>
> Thanks Daniel for your explanation. But still curious why we do such
> design, it's unexpected for me.
>
> Also, this behavior of deleteColumns make delete operation not very user
> friendly, why not use deleteColumn instead in hbase shell and thrift client?
>
> Thanks,
> Yi
>
> 2011/11/24 Daniel Gómez Ferro <[EMAIL PROTECTED]>
>
>>
>> On Nov 24, 2011, at 08:38 , Yi Liang wrote:
>>
>> > We're using hbase-0.90.3 with thrift client, and have encountered some
>> > problems when we want to delete one specific version of a cell.
>> >
>> > First, there's no corresponding thrift api for Delete#deleteColumn(byte
>> []
>> > family, byte [] qualifier, long timestamp). Instead, deleteColumns is
>> > supported in mutateRowTs.  But what we want is deleteColumn as we need to
>> > keep the older versions. IMO, we should implement mutateRowTs
>> > with deleteColumn, rather than deleteColumns. The hbase shell's delete
>> > command has the same problem.
>> >
>> > Second, we find we can't reinsert any older cell if we have deleted that
>> > cell with deleteColumns. For example:
>> > hbase(main):007:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > 0 row(s) in 0.0110 seconds
>> >
>> > hbase(main):008:0> put 'test3', 'r1', 'f1:c1', 'old', 1315550678308
>> > 0 row(s) in 0.0100 seconds
>> >
>> > hbase(main):009:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > r1                                           column=f1:c1,
>> > timestamp=1315550678308, value=old
>> > 1 row(s) in 0.0290 seconds
>> >
>> > hbase(main):012:0> put 'test3', 'r1', 'f1:c1', 'new'
>> > 0 row(s) in 0.0090 seconds
>> >
>> > hbase(main):013:0> scan 'test3'
>> > ROW                                           COLUMN+CELL
>> > r1                                           column=f1:c1,
>> > timestamp=1322119570316, value=new
>> > 1 row(s) in 0.0140 seconds
>> >
>> > hbase(main):014:0> delete 'test3', 'r1', 'f1:c1', 1322119570316
>> > 0 row(s) in 0.0130 seconds
>> >
>> > hbase(main):015:0> scan 'test3'
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB