Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Compaction throttling and per-region compaction automation


+
Otis Gospodnetic 2012-11-06, 18:38
+
Jeremy Carroll 2012-11-06, 18:59
Copy link to this message
-
Re: Compaction throttling and per-region compaction automation
Thanks Jeremy (I see you everywhere I turn!)

https://issues.apache.org/jira/browse/HBASE-5867 sounds like there is
compaction throttling in 0.96.0, no?

Lucene faces very similar problems as HBase, I think.
* An index has multiple segments.
* Files are added, not modified.
* Deletions are markers/tombstones.
* Optimization process purges deletes and rewrites segments.

But there are some other options like:
* Optimize only partially, not all the way down to just 1 segment, but down
to N
* Only expunge deletes, don't actually merge segments and rewrite them to
disk
* Pick segments with most deletes first
* Throttle IO
   See

http://search-lucene.com/c/Lucene:core/src/java/org/apache/lucene/store/RateLimiter.java

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/RateLimiter.html

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/TieredMergePolicy.html
    http://search-lucene.com/?q=throttle+merge&fc_project=Lucene

Maybe some of the above is "borrowable" if throttling has not been
implemented yet.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Tue, Nov 6, 2012 at 1:59 PM, Jeremy Carroll <[EMAIL PROTECTED]> wrote:

> To date I have used the major / minor compaction threads to control how
> many compactions are allowed to exist at one time on a per RegionServer
> basis. Then compact a table, and have the threads control how many regions
> can compact at once. With care taken if minors are upgraded to majors as
> there is no throttling on Disk IO for majors which can be very impactful.
>
> Aravind created an offline compaction script which pre-dated the
> threading implementation which you may find useful.
>
> https://github.com/aravind/hbase_compact
>
> On Tue, Nov 6, 2012 at 10:38 AM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > Major compactions are..... you know... :)
> > So I saw there is https://issues.apache.org/jira/browse/HBASE-3743 for
> > throttling them.
> >
> > We are about to try the per-region compaction, so I was wondering if
> anyone
> > has written a tool/script to automate that a bit?
> >
> > Thanks,
> > Otis
> > --
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
>
+
Jeremy Carroll 2012-11-06, 19:24
+
Jeremy Carroll 2012-11-06, 19:25