Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Still Seeing Old Data After a Delete


Copy link to this message
-
Re: Still Seeing Old Data After a Delete
Hi Shwan,

My hbase-version is 0.92.0. I have to mention that in recently I
noticed that the delete semantics between shell and Java api are
different. In shell, if you delete one version, it will mask the
versions whose timestamps are older than that version, it means that
scan will not return the values whose values are older than that one.
But, if you use Java api, e.g.  delete.deleteColumn() method, it will
only delete that specific version. It will not affect the versions
whose timestamps are older than that one. I hope it's useful for you!

Regards!

Yong

On Tue, Mar 27, 2012 at 7:33 PM, Shawn Quinn <[EMAIL PROTECTED]> wrote:
> Hi Lars,
>
> Thanks for the quick reply!  In this case we we're doing a column delete
> like so:
>
>            Delete delete = new Delete(rowKey);
>            delete.deleteColumn(Bytes.toBytes("thing"),
> Bytes.toBytes(value));
>            table.delete(delete);
>
> However, your response caused me to notice the "Delete.deleteColumns()"
> method in the JavaDoc instead of simply "Delete.deleteColumn()".  Calling
> the "deleteColumns" instead of "deleteColumn" fixes the problem we were
> seeing.  That wasn't immediately obvious to me after reading the book, but
> after reading the JavaDoc I now understand the distinction between the two
> methods.
>
> I may be the only one who missed that at first, but in case others have a
> similar confusion it might be worth a comment in the book that
> "deleteColumn()" is really only for deleting a single version and
> "deleteColumns()" is for deleting all versions.  E.g. the second type noted
> in the book currently is listed as "Delete column: for all versions of a
> column".  But, from the API perspective that's really the "deleteColumns()"
> method.  (Whereas, my incorrect intuition when just looking at the API was
> that the "deleteColumns()" method would likely be for deleting multiple
> different columns.)
>
> Thanks again for the quick follow up,
>
>     -Shawn
>
> On Tue, Mar 27, 2012 at 1:19 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Hey Shawn,
>>
>> how exactly did you delete the column?
>> There are three types of delete markers: family, column, version.
>> Your observation would be consistent with having used a version delete
>> marker, which just marks are a specific version (the latest by default) for
>> delete.
>>
>> Check out the HBase Reference Guide:
>> http://hbase.apache.org/book.html#version.delete
>>
>> Also, if you don't mind the plug see a more detailed discussion here:
>> http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Shawn Quinn <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Tuesday, March 27, 2012 10:01 AM
>> Subject: Still Seeing Old Data After a Delete
>>
>> Hello,
>>
>> In a couple of situations we were noticing some odd problems with old data
>> appearing in the application, and I finally found a reproducible scenario.
>> Here's what we're seeing in one basic case:
>>
>> 1. Using a scan in hbase shell one of our column cells (both the column
>> name and value are simple long's) looks like so:
>>
>> column=thing:\x00\x00\x00\x00\x00\x00\x00\x02, timestamp=1332795701976,
>> value=\x00\x00\x00\x00\x00\x00\x00s
>>
>> 2. If we then use a "Put" to update that cell to a new value it looks as
>> we'd expect like so:
>>
>> column=thing:\x00\x00\x00\x00\x00\x00\x00\x02, timestamp=1332866682295,
>> value=\x00\x00\x00\x00\x00\x00\x00u
>>
>> 3. If we then use a "Delete" to remove that column, instead of the column
>> no longer being included in the scan we instead see the following again:
>>
>> column=thing:\x00\x00\x00\x00\x00\x00\x00\x02, timestamp=1332795701976,
>> value=\x00\x00\x00\x00\x00\x00\x00s
>>
>> So, for some reason, at least in this case, the tombstone/delete marker
>> doesn't appear to be preventing new scans from no longer seeing the old
>> data.
>>
>> Note that this is a small development cluster of HBase (version: