Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> TTL performance


Hi Fredric

hbase.store.delete.expired.storefile - Set this property to true.

This property helps you to delete the store files before compaction.  If you
are interested you can check HBASE-5199.

It is available in 0.94 and above.  Hope this helps.
Regards
Ram

> -----Original Message-----
> From: Frédéric Fondement [mailto:[EMAIL PROTECTED]]
> Sent: Monday, June 25, 2012 2:05 PM
> To: [EMAIL PROTECTED]
> Subject: Re: TTL performance
>
> Hi,
>
> And thanks for your answers.
>
> Actually, I'm already having control on my major compactions using a
> cron, at night, merely execution this bash code:
> echo "status 'detailed'" | hbase shell | grep "<<table prefix>>" | awk
> -F, '{print $1}' | tr -d ' ' | sort | uniq -c | sort -nr | awk '{print
> "major_compact " sprintf( "%c", 39 ) $2 sprintf( "%c", 39 )}' | hbase
> shell >>$LOGFILE 2>&1
> This lines makes it sure biggest regions are major-compacted first.
>
> I'm not using versions.
>
> My question was actually: given a table with millions, billions or
> whatever number of rows, how fast is the TTL handling process ? How are
> rows scanned during major compaction ? Are they all scanned in order to
> know whether they should be removed from the filesystem (be it HDFS or
> whatever else) ? Or is there any optimization making sure it can fatly
> finds those parts to be deleted ?
>
> Best regards,
>
> Frédéric.
>
>
> Le 21/06/2012 23:03, Andrew Purtell a écrit :
> >> 2012/6/21, Frédéric Fondement<[EMAIL PROTECTED]>:
> >> opt3. looks the nicest (only 3-4 tables to scan when reading), but
> won't my daily major compact become crazy ?
> > If you want more control over the major compaction process, for
> > example to lessen the load on your production cluster to a constant
> > background level, the HBase shell is the JRuby irb so you have the
> > full power of the HBase API and Ruby, in the worst case you can write
> > a shell script that gets a list of regions and triggers major
> > compaction on each region separately or according to whatever policy
> > you construct. The script invocation can happen manually or out of
> > crontab.
> >
> > Another performance consideration is how many expired cells might
> have
> > to be skipped by a scan. If you have a wide area of the keyspace that
> > is all expired at once, then the scan will seem to "pause" while
> > traversing this area. However, you can use setTimeRange to bound your
> > scan by time range and then HBase can optimize whole HFiles away just
> > by examining their metadata. Therefore I would recommend using both
> > TTLs for automatic background garbage collection of expired entries,
> > as well as time range bounded scans for read time optimization.
> >
> > Incidentally, there was an interesting presentation at HBaseCon
> > recently regarding a creative use of timestamps:
> > http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-
> base-suraj-varma-gap-inc-finalupdated-last-minute
> > (slide 16).
> >
> > Best regards,
> >
> >     - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB