Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Delete all data before a given timestamp


Copy link to this message
-
Re: Delete all data before a given timestamp
Yes, this is what we did now. We maintained a lower bound of timestamp for
scan. Once an index build is done, we increase it to a higher value.
On Wed, Jul 17, 2013 at 2:50 AM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:

> When you set up the MR, does it help to set a proper timestamp filter or
> time range in the scan object?
>
>
> On Tue, Jul 16, 2013 at 5:59 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
> > Another option might be to setup the proper TTL on the table? You alter
> the
> > table to set the TTL to reflect your timestamp, the you run a compaction?
> > The issue is that you have to disable the table while you alter it.
> >
> > JM
> >
> > 2013/7/16 Ted Yu <[EMAIL PROTECTED]>
> >
> > > Would this method (of Delete) serve your need ?
> > >
> > >   public Delete deleteFamily(byte [] family, long timestamp) {
> > > From its Javadoc:
> > >
> > >    * Delete all columns of the specified family with a timestamp less
> > than
> > >
> > >    * or equal to the specified timestamp.
> > >
> > > On Mon, Jul 15, 2013 at 8:07 PM, Chao Shi <[EMAIL PROTECTED]> wrote:
> > >
> > > > Jean-Marc Spaggiari <jean-marc@...> writes:
> > > >
> > > > >
> > > > > When you send a delete command to the server, you can specify a
> > > > timestamp.
> > > > > So as the result of your MR job,"just" emit this delete with the
> > > specific
> > > > > timestamp to remove any previous version?
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/7/15 Chao Shi <stepinto@...>
> > > > >
> > > > > > Hi HBase users,
> > > > > >
> > > > > > We have created a index table (say T2) of another table (say t1).
> > The
> > > > > > clients who write to T1 also write a index record to T2 with the
> > same
> > > > > > timestamp. There may be accumulated inconsistency as time goes
> by.
> > So
> > > > we
> > > > > > run a MR job periodically, which fully scans T1, builds a index,
> > and
> > > > > > bulk-loads the result to T2.
> > > > > >
> > > > > > Because the MR job may be running for a while, during the period
> of
> > > > which,
> > > > > > all new data into T2 must be kept and not be overridden. So the
> MR
> > > > creates
> > > > > > puts using the timestamp the job starts.
> > > > > >
> > > > > > Then we want all data in T2 before a given timestamp to invisible
> > for
> > > > read
> > > > > > after the index builds successfully and get deleted eventually
> > (e.g.
> > > > during
> > > > > > major compaction). We prefer setting it explicitly than using the
> > TTL
> > > > > > feature for safety, as we want only old data are deleted only
> when
> > > the
> > > > new
> > > > > > data is written. Does HBase support this kind of operation for
> now?
> > > > > >
> > > > > > Thanks,
> > > > > > Chao
> > > > > >
> > > > >
> > > >
> > > > Hi Jean-Marc,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > I see delete can specify a timestamp, but I don't think that is what
> I
> > > > need.
> > > > To clarify, in my scenario, I don't want to issue deletes for every
> key
> > > > (because I don't know what exactly to delete unless do another full
> > > scan).
> > > >
> > > > I'd like to see if this is possible: set a min_timestamp to
> > > > ColumnDescriptor. Once done, KVs before this timestamp become
> invisible
> > > to
> > > > read. During major compaction, these KVs are deleted. It is the
> > absolute
> > > > version of TTL.
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB