Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Delete all data before a given timestamp


+
Chao Shi 2013-07-15, 10:36
+
Jean-Marc Spaggiari 2013-07-15, 16:48
+
Chao Shi 2013-07-16, 03:07
Copy link to this message
-
Re: Delete all data before a given timestamp
Ted Yu 2013-07-16, 04:25
Would this method (of Delete) serve your need ?

  public Delete deleteFamily(byte [] family, long timestamp) {
>From its Javadoc:

   * Delete all columns of the specified family with a timestamp less than

   * or equal to the specified timestamp.

On Mon, Jul 15, 2013 at 8:07 PM, Chao Shi <[EMAIL PROTECTED]> wrote:

> Jean-Marc Spaggiari <jean-marc@...> writes:
>
> >
> > When you send a delete command to the server, you can specify a
> timestamp.
> > So as the result of your MR job,"just" emit this delete with the specific
> > timestamp to remove any previous version?
> >
> > JM
> >
> > 2013/7/15 Chao Shi <stepinto@...>
> >
> > > Hi HBase users,
> > >
> > > We have created a index table (say T2) of another table (say t1). The
> > > clients who write to T1 also write a index record to T2 with the same
> > > timestamp. There may be accumulated inconsistency as time goes by. So
> we
> > > run a MR job periodically, which fully scans T1, builds a index, and
> > > bulk-loads the result to T2.
> > >
> > > Because the MR job may be running for a while, during the period of
> which,
> > > all new data into T2 must be kept and not be overridden. So the MR
> creates
> > > puts using the timestamp the job starts.
> > >
> > > Then we want all data in T2 before a given timestamp to invisible for
> read
> > > after the index builds successfully and get deleted eventually (e.g.
> during
> > > major compaction). We prefer setting it explicitly than using the TTL
> > > feature for safety, as we want only old data are deleted only when the
> new
> > > data is written. Does HBase support this kind of operation for now?
> > >
> > > Thanks,
> > > Chao
> > >
> >
>
> Hi Jean-Marc,
>
> Thanks for the reply.
>
> I see delete can specify a timestamp, but I don't think that is what I
> need.
> To clarify, in my scenario, I don't want to issue deletes for every key
> (because I don't know what exactly to delete unless do another full scan).
>
> I'd like to see if this is possible: set a min_timestamp to
> ColumnDescriptor. Once done, KVs before this timestamp become invisible to
> read. During major compaction, these KVs are deleted. It is the absolute
> version of TTL.
>
>
>
>
>
+
Jean-Marc Spaggiari 2013-07-16, 12:59
+
Jimmy Xiang 2013-07-16, 18:50
+
Chao Shi 2013-07-17, 03:35
+
Chao Shi 2013-07-17, 03:31
+
Chao Shi 2013-07-17, 03:24
+
lars hofhansl 2013-07-16, 03:52