|
Otis Gospodnetic
2012-11-06, 18:38
Jeremy Carroll
2012-11-06, 18:59
Otis Gospodnetic
2012-11-06, 19:06
Jeremy Carroll
2012-11-06, 19:24
Jeremy Carroll
2012-11-06, 19:25
|
-
Compaction throttling and per-region compaction automationOtis Gospodnetic 2012-11-06, 18:38
Hi,
Major compactions are..... you know... :) So I saw there is https://issues.apache.org/jira/browse/HBASE-3743 for throttling them. We are about to try the per-region compaction, so I was wondering if anyone has written a tool/script to automate that a bit? Thanks, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html
-
Re: Compaction throttling and per-region compaction automationJeremy Carroll 2012-11-06, 18:59
To date I have used the major / minor compaction threads to control how
many compactions are allowed to exist at one time on a per RegionServer basis. Then compact a table, and have the threads control how many regions can compact at once. With care taken if minors are upgraded to majors as there is no throttling on Disk IO for majors which can be very impactful. Aravind created an offline compaction script which pre-dated the threading implementation which you may find useful. https://github.com/aravind/hbase_compact On Tue, Nov 6, 2012 at 10:38 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi, > > Major compactions are..... you know... :) > So I saw there is https://issues.apache.org/jira/browse/HBASE-3743 for > throttling them. > > We are about to try the per-region compaction, so I was wondering if anyone > has written a tool/script to automate that a bit? > > Thanks, > Otis > -- > Search Analytics - http://sematext.com/search-analytics/index.html > Performance Monitoring - http://sematext.com/spm/index.html >
-
Re: Compaction throttling and per-region compaction automationOtis Gospodnetic 2012-11-06, 19:06
Thanks Jeremy (I see you everywhere I turn!)
https://issues.apache.org/jira/browse/HBASE-5867 sounds like there is compaction throttling in 0.96.0, no? Lucene faces very similar problems as HBase, I think. * An index has multiple segments. * Files are added, not modified. * Deletions are markers/tombstones. * Optimization process purges deletes and rewrites segments. But there are some other options like: * Optimize only partially, not all the way down to just 1 segment, but down to N * Only expunge deletes, don't actually merge segments and rewrite them to disk * Pick segments with most deletes first * Throttle IO See http://search-lucene.com/c/Lucene:core/src/java/org/apache/lucene/store/RateLimiter.java http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/RateLimiter.html http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/TieredMergePolicy.html http://search-lucene.com/?q=throttle+merge&fc_project=Lucene Maybe some of the above is "borrowable" if throttling has not been implemented yet. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Tue, Nov 6, 2012 at 1:59 PM, Jeremy Carroll <[EMAIL PROTECTED]> wrote: > To date I have used the major / minor compaction threads to control how > many compactions are allowed to exist at one time on a per RegionServer > basis. Then compact a table, and have the threads control how many regions > can compact at once. With care taken if minors are upgraded to majors as > there is no throttling on Disk IO for majors which can be very impactful. > > Aravind created an offline compaction script which pre-dated the > threading implementation which you may find useful. > > https://github.com/aravind/hbase_compact > > On Tue, Nov 6, 2012 at 10:38 AM, Otis Gospodnetic < > [EMAIL PROTECTED]> wrote: > > > Hi, > > > > Major compactions are..... you know... :) > > So I saw there is https://issues.apache.org/jira/browse/HBASE-3743 for > > throttling them. > > > > We are about to try the per-region compaction, so I was wondering if > anyone > > has written a tool/script to automate that a bit? > > > > Thanks, > > Otis > > -- > > Search Analytics - http://sematext.com/search-analytics/index.html > > Performance Monitoring - http://sematext.com/spm/index.html > > >
-
Re: Compaction throttling and per-region compaction automationJeremy Carroll 2012-11-06, 19:24
We ran into the throttleSize here at Klout in this issue (
https://issues.apache.org/jira/browse/HBASE-592). Everything was promoted to the major compaction threads as a result. It was an error on our part since we were bulk loading files, and not using puts / flush sizes for it's compaction logic. The patches are not IO rate limiting, but a way to determine how many compaction threads are available to run (Queues), and when items are promoted from the small (minor) queue to the large (major) queue. I would welcome any real IO throttling on a per server basis. ;) On Tue, Nov 6, 2012 at 11:06 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Thanks Jeremy (I see you everywhere I turn!) > > https://issues.apache.org/jira/browse/HBASE-5867 sounds like there is > compaction throttling in 0.96.0, no? > > Lucene faces very similar problems as HBase, I think. > * An index has multiple segments. > * Files are added, not modified. > * Deletions are markers/tombstones. > * Optimization process purges deletes and rewrites segments. > > But there are some other options like: > * Optimize only partially, not all the way down to just 1 segment, but down > to N > * Only expunge deletes, don't actually merge segments and rewrite them to > disk > * Pick segments with most deletes first > * Throttle IO > See > > > http://search-lucene.com/c/Lucene:core/src/java/org/apache/lucene/store/RateLimiter.java > > > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/RateLimiter.html > > > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/TieredMergePolicy.html > http://search-lucene.com/?q=throttle+merge&fc_project=Lucene > > Maybe some of the above is "borrowable" if throttling has not been > implemented yet. > > Otis > -- > Search Analytics - http://sematext.com/search-analytics/index.html > Performance Monitoring - http://sematext.com/spm/index.html > > > On Tue, Nov 6, 2012 at 1:59 PM, Jeremy Carroll <[EMAIL PROTECTED]> > wrote: > > > To date I have used the major / minor compaction threads to control how > > many compactions are allowed to exist at one time on a per RegionServer > > basis. Then compact a table, and have the threads control how many > regions > > can compact at once. With care taken if minors are upgraded to majors as > > there is no throttling on Disk IO for majors which can be very impactful. > > > > Aravind created an offline compaction script which pre-dated the > > threading implementation which you may find useful. > > > > https://github.com/aravind/hbase_compact > > > > On Tue, Nov 6, 2012 at 10:38 AM, Otis Gospodnetic < > > [EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > Major compactions are..... you know... :) > > > So I saw there is https://issues.apache.org/jira/browse/HBASE-3743 for > > > throttling them. > > > > > > We are about to try the per-region compaction, so I was wondering if > > anyone > > > has written a tool/script to automate that a bit? > > > > > > Thanks, > > > Otis > > > -- > > > Search Analytics - http://sematext.com/search-analytics/index.html > > > Performance Monitoring - http://sematext.com/spm/index.html > > > > > >
-
Re: Compaction throttling and per-region compaction automationJeremy Carroll 2012-11-06, 19:25
Wrong issue. https://issues.apache.org/jira/browse/HBASE-5920
On Tue, Nov 6, 2012 at 11:24 AM, Jeremy Carroll <[EMAIL PROTECTED]> wrote: > We ran into the throttleSize here at Klout in this issue ( > https://issues.apache.org/jira/browse/HBASE-592). Everything was promoted > to the major compaction threads as a result. It was an error on our part > since we were bulk loading files, and not using puts / flush sizes for it's > compaction logic. > > The patches are not IO rate limiting, but a way to determine how many > compaction threads are available to run (Queues), and when items are > promoted from the small (minor) queue to the large (major) queue. > > I would welcome any real IO throttling on a per server basis. ;) > > > On Tue, Nov 6, 2012 at 11:06 AM, Otis Gospodnetic < > [EMAIL PROTECTED]> wrote: > >> Thanks Jeremy (I see you everywhere I turn!) >> >> https://issues.apache.org/jira/browse/HBASE-5867 sounds like there is >> compaction throttling in 0.96.0, no? >> >> Lucene faces very similar problems as HBase, I think. >> * An index has multiple segments. >> * Files are added, not modified. >> * Deletions are markers/tombstones. >> * Optimization process purges deletes and rewrites segments. >> >> But there are some other options like: >> * Optimize only partially, not all the way down to just 1 segment, but >> down >> to N >> * Only expunge deletes, don't actually merge segments and rewrite them to >> disk >> * Pick segments with most deletes first >> * Throttle IO >> See >> >> >> http://search-lucene.com/c/Lucene:core/src/java/org/apache/lucene/store/RateLimiter.java >> >> >> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/RateLimiter.html >> >> >> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/TieredMergePolicy.html >> http://search-lucene.com/?q=throttle+merge&fc_project=Lucene >> >> Maybe some of the above is "borrowable" if throttling has not been >> implemented yet. >> >> Otis >> -- >> Search Analytics - http://sematext.com/search-analytics/index.html >> Performance Monitoring - http://sematext.com/spm/index.html >> >> >> On Tue, Nov 6, 2012 at 1:59 PM, Jeremy Carroll <[EMAIL PROTECTED]> >> wrote: >> >> > To date I have used the major / minor compaction threads to control how >> > many compactions are allowed to exist at one time on a per RegionServer >> > basis. Then compact a table, and have the threads control how many >> regions >> > can compact at once. With care taken if minors are upgraded to majors as >> > there is no throttling on Disk IO for majors which can be very >> impactful. >> > >> > Aravind created an offline compaction script which pre-dated the >> > threading implementation which you may find useful. >> > >> > https://github.com/aravind/hbase_compact >> > >> > On Tue, Nov 6, 2012 at 10:38 AM, Otis Gospodnetic < >> > [EMAIL PROTECTED]> wrote: >> > >> > > Hi, >> > > >> > > Major compactions are..... you know... :) >> > > So I saw there is https://issues.apache.org/jira/browse/HBASE-3743for >> > > throttling them. >> > > >> > > We are about to try the per-region compaction, so I was wondering if >> > anyone >> > > has written a tool/script to automate that a bit? >> > > >> > > Thanks, >> > > Otis >> > > -- >> > > Search Analytics - http://sematext.com/search-analytics/index.html >> > > Performance Monitoring - http://sematext.com/spm/index.html >> > > >> > >> > > |