Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> TTL performance


Hi Fredric

hbase.store.delete.expired.storefile - Set this property to true.

This property helps you to delete the store files before compaction.  If you
are interested you can check HBASE-5199.

It is available in 0.94 and above.  Hope this helps.
Regards
Ram

> -----Original Message-----
> From: Frédéric Fondement [mailto:[EMAIL PROTECTED]]
> Sent: Monday, June 25, 2012 2:05 PM
> To: [EMAIL PROTECTED]
> Subject: Re: TTL performance
>
> Hi,
>
> And thanks for your answers.
>
> Actually, I'm already having control on my major compactions using a
> cron, at night, merely execution this bash code:
> echo "status 'detailed'" | hbase shell | grep "<<table prefix>>" | awk
> -F, '{print $1}' | tr -d ' ' | sort | uniq -c | sort -nr | awk '{print
> "major_compact " sprintf( "%c", 39 ) $2 sprintf( "%c", 39 )}' | hbase
> shell >>$LOGFILE 2>&1
> This lines makes it sure biggest regions are major-compacted first.
>
> I'm not using versions.
>
> My question was actually: given a table with millions, billions or
> whatever number of rows, how fast is the TTL handling process ? How are
> rows scanned during major compaction ? Are they all scanned in order to
> know whether they should be removed from the filesystem (be it HDFS or
> whatever else) ? Or is there any optimization making sure it can fatly
> finds those parts to be deleted ?
>
> Best regards,
>
> Frédéric.
>
>
> Le 21/06/2012 23:03, Andrew Purtell a écrit :
> >> 2012/6/21, Frédéric Fondement<[EMAIL PROTECTED]>:
> >> opt3. looks the nicest (only 3-4 tables to scan when reading), but
> won't my daily major compact become crazy ?
> > If you want more control over the major compaction process, for
> > example to lessen the load on your production cluster to a constant
> > background level, the HBase shell is the JRuby irb so you have the
> > full power of the HBase API and Ruby, in the worst case you can write
> > a shell script that gets a list of regions and triggers major
> > compaction on each region separately or according to whatever policy
> > you construct. The script invocation can happen manually or out of
> > crontab.
> >
> > Another performance consideration is how many expired cells might
> have
> > to be skipped by a scan. If you have a wide area of the keyspace that
> > is all expired at once, then the scan will seem to "pause" while
> > traversing this area. However, you can use setTimeRange to bound your
> > scan by time range and then HBase can optimize whole HFiles away just
> > by examining their metadata. Therefore I would recommend using both
> > TTLs for automatic background garbage collection of expired entries,
> > as well as time range bounded scans for read time optimization.
> >
> > Incidentally, there was an interesting presentation at HBaseCon
> > recently regarding a creative use of timestamps:
> > http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-
> base-suraj-varma-gap-inc-finalupdated-last-minute
> > (slide 16).
> >
> > Best regards,
> >
> >     - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)