Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Question about compactions


Copy link to this message
-
Re: Question about compactions
Related to this discussion, Jimmy provided some function to check for
compaction state in HBASE-6033.
But that is in 0.95 only.

On Thu, Mar 21, 2013 at 10:49 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> On Thu, Mar 21, 2013 at 6:46 AM, Brennon Church <[EMAIL PROTECTED]>
> wrote:
> > Hello all,
> >
> > As I understand it, a common performance tweak is to disable major
> > compactions so that you don't end up with storms taking things out at
> > inconvenient times.  I'm thinking that I should just write a quick
> script to
> > rotate through all of our regions, one at a time, and compact them.
>  Again,
> > if I'm understanding this correctly we should not end up with storms as
> > they'll only happen one at a time, and each one doesn't run for long.
>  Does
> > that seem reasonable, or am I missing something?  My hope is to run the
> > script regularly.
>
> FWIW major compacting isn't even needed if you don't update or delete
> cells so do consider that too.
>
> The problem with scheduling major compactions yourself is that, since
> the command is async, you can still end up with a storm of compactions
> if you just blindly issue major_compact for all your regions. Things
> like adding wait time works but then let's say you want the
> compactions to run only between 2 and 4AM then you can run out of
> time. What I have seen to circumvent this is to only do a subset of
> the regions at a time. You can also use JMX to monitor the compaction
> queue on each RS and make sure you are not just piling them up, but
> this requires some more work.
>
> >
> > Corollary question... I recently added drives to our nodes and since I
> did
> > this while they were all still running, basically just restarting the
> > datanode underneath to pick up the new spindles, I'm fairly sure I've
> thrown
> > data locality out the window, based on the changed pattern of network
> > traffic.
>
> Interesting but unlikely. Even restarting HBase shouldn't do that
> unless it was wrongly restarted. Each RS publishes a locality index
> (hdfsBlocksLocalityIndex) that you can find via JMX or in their web
> UI, are they close to 100% or way down? Also which version are you on?
>
> > If I'm right, manually running major compactions against all of
> > the regions should resolve that, as the underlying data would all get
> > written locally.  Again, does that make sense?
>
> Major compacting would do that yes, but first check if you need it at
> all I think.
>
> J-D
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB