Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Delete all data before a given timestamp

Copy link to this message
Re: Delete all data before a given timestamp
You might be interested in HBASE-8784 (https://issues.apache.org/jira/browse/HBASE-8784).

----- Original Message -----
From: Chao Shi <[EMAIL PROTECTED]>
Sent: Monday, July 15, 2013 8:07 PM
Subject: Re: Delete all data before a given timestamp

Jean-Marc Spaggiari <jean-marc@...> writes:

> When you send a delete command to the server, you can specify a timestamp.
> So as the result of your MR job,"just" emit this delete with the specific
> timestamp to remove any previous version?
> JM
> 2013/7/15 Chao Shi <stepinto@...>
> > Hi HBase users,
> >
> > We have created a index table (say T2) of another table (say t1). The
> > clients who write to T1 also write a index record to T2 with the same
> > timestamp. There may be accumulated inconsistency as time goes by. So we
> > run a MR job periodically, which fully scans T1, builds a index, and
> > bulk-loads the result to T2.
> >
> > Because the MR job may be running for a while, during the period of
> > all new data into T2 must be kept and not be overridden. So the MR
> > puts using the timestamp the job starts.
> >
> > Then we want all data in T2 before a given timestamp to invisible for
> > after the index builds successfully and get deleted eventually (e.g.
> > major compaction). We prefer setting it explicitly than using the TTL
> > feature for safety, as we want only old data are deleted only when the
> > data is written. Does HBase support this kind of operation for now?
> >
> > Thanks,
> > Chao
> >

Hi Jean-Marc,

Thanks for the reply.

I see delete can specify a timestamp, but I don't think that is what I need.
To clarify, in my scenario, I don't want to issue deletes for every key
(because I don't know what exactly to delete unless do another full scan).

I'd like to see if this is possible: set a min_timestamp to
ColumnDescriptor. Once done, KVs before this timestamp become invisible to
read. During major compaction, these KVs are deleted. It is the absolute
version of TTL.