Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Delete all data before a given timestamp


+
Chao Shi 2013-07-15, 10:36
+
Jean-Marc Spaggiari 2013-07-15, 16:48
+
Chao Shi 2013-07-16, 03:07
Copy link to this message
-
Re: Delete all data before a given timestamp
Would this method (of Delete) serve your need ?

  public Delete deleteFamily(byte [] family, long timestamp) {
>From its Javadoc:

   * Delete all columns of the specified family with a timestamp less than

   * or equal to the specified timestamp.

On Mon, Jul 15, 2013 at 8:07 PM, Chao Shi <[EMAIL PROTECTED]> wrote:

> Jean-Marc Spaggiari <jean-marc@...> writes:
>
> >
> > When you send a delete command to the server, you can specify a
> timestamp.
> > So as the result of your MR job,"just" emit this delete with the specific
> > timestamp to remove any previous version?
> >
> > JM
> >
> > 2013/7/15 Chao Shi <stepinto@...>
> >
> > > Hi HBase users,
> > >
> > > We have created a index table (say T2) of another table (say t1). The
> > > clients who write to T1 also write a index record to T2 with the same
> > > timestamp. There may be accumulated inconsistency as time goes by. So
> we
> > > run a MR job periodically, which fully scans T1, builds a index, and
> > > bulk-loads the result to T2.
> > >
> > > Because the MR job may be running for a while, during the period of
> which,
> > > all new data into T2 must be kept and not be overridden. So the MR
> creates
> > > puts using the timestamp the job starts.
> > >
> > > Then we want all data in T2 before a given timestamp to invisible for
> read
> > > after the index builds successfully and get deleted eventually (e.g.
> during
> > > major compaction). We prefer setting it explicitly than using the TTL
> > > feature for safety, as we want only old data are deleted only when the
> new
> > > data is written. Does HBase support this kind of operation for now?
> > >
> > > Thanks,
> > > Chao
> > >
> >
>
> Hi Jean-Marc,
>
> Thanks for the reply.
>
> I see delete can specify a timestamp, but I don't think that is what I
> need.
> To clarify, in my scenario, I don't want to issue deletes for every key
> (because I don't know what exactly to delete unless do another full scan).
>
> I'd like to see if this is possible: set a min_timestamp to
> ColumnDescriptor. Once done, KVs before this timestamp become invisible to
> read. During major compaction, these KVs are deleted. It is the absolute
> version of TTL.
>
>
>
>
>
+
Jean-Marc Spaggiari 2013-07-16, 12:59
+
Jimmy Xiang 2013-07-16, 18:50
+
Chao Shi 2013-07-17, 03:35
+
Chao Shi 2013-07-17, 03:31
+
Chao Shi 2013-07-17, 03:24
+
lars hofhansl 2013-07-16, 03:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB