Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Compaction throttling and per-region compaction automation


Copy link to this message
-
Re: Compaction throttling and per-region compaction automation
Thanks Jeremy (I see you everywhere I turn!)

https://issues.apache.org/jira/browse/HBASE-5867 sounds like there is
compaction throttling in 0.96.0, no?

Lucene faces very similar problems as HBase, I think.
* An index has multiple segments.
* Files are added, not modified.
* Deletions are markers/tombstones.
* Optimization process purges deletes and rewrites segments.

But there are some other options like:
* Optimize only partially, not all the way down to just 1 segment, but down
to N
* Only expunge deletes, don't actually merge segments and rewrite them to
disk
* Pick segments with most deletes first
* Throttle IO
   See

http://search-lucene.com/c/Lucene:core/src/java/org/apache/lucene/store/RateLimiter.java

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/RateLimiter.html

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/TieredMergePolicy.html
    http://search-lucene.com/?q=throttle+merge&fc_project=Lucene

Maybe some of the above is "borrowable" if throttling has not been
implemented yet.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Tue, Nov 6, 2012 at 1:59 PM, Jeremy Carroll <[EMAIL PROTECTED]> wrote:

> To date I have used the major / minor compaction threads to control how
> many compactions are allowed to exist at one time on a per RegionServer
> basis. Then compact a table, and have the threads control how many regions
> can compact at once. With care taken if minors are upgraded to majors as
> there is no throttling on Disk IO for majors which can be very impactful.
>
> Aravind created an offline compaction script which pre-dated the
> threading implementation which you may find useful.
>
> https://github.com/aravind/hbase_compact
>
> On Tue, Nov 6, 2012 at 10:38 AM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > Major compactions are..... you know... :)
> > So I saw there is https://issues.apache.org/jira/browse/HBASE-3743 for
> > throttling them.
> >
> > We are about to try the per-region compaction, so I was wondering if
> anyone
> > has written a tool/script to automate that a bit?
> >
> > Thanks,
> > Otis
> > --
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB