Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: Question about compactions


Copy link to this message
-
Re: Question about compactions
Ted Yu 2013-03-21, 18:08
Related to this discussion, Jimmy provided some function to check for
compaction state in HBASE-6033.
But that is in 0.95 only.

On Thu, Mar 21, 2013 at 10:49 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> On Thu, Mar 21, 2013 at 6:46 AM, Brennon Church <[EMAIL PROTECTED]>
> wrote:
> > Hello all,
> >
> > As I understand it, a common performance tweak is to disable major
> > compactions so that you don't end up with storms taking things out at
> > inconvenient times.  I'm thinking that I should just write a quick
> script to
> > rotate through all of our regions, one at a time, and compact them.
>  Again,
> > if I'm understanding this correctly we should not end up with storms as
> > they'll only happen one at a time, and each one doesn't run for long.
>  Does
> > that seem reasonable, or am I missing something?  My hope is to run the
> > script regularly.
>
> FWIW major compacting isn't even needed if you don't update or delete
> cells so do consider that too.
>
> The problem with scheduling major compactions yourself is that, since
> the command is async, you can still end up with a storm of compactions
> if you just blindly issue major_compact for all your regions. Things
> like adding wait time works but then let's say you want the
> compactions to run only between 2 and 4AM then you can run out of
> time. What I have seen to circumvent this is to only do a subset of
> the regions at a time. You can also use JMX to monitor the compaction
> queue on each RS and make sure you are not just piling them up, but
> this requires some more work.
>
> >
> > Corollary question... I recently added drives to our nodes and since I
> did
> > this while they were all still running, basically just restarting the
> > datanode underneath to pick up the new spindles, I'm fairly sure I've
> thrown
> > data locality out the window, based on the changed pattern of network
> > traffic.
>
> Interesting but unlikely. Even restarting HBase shouldn't do that
> unless it was wrongly restarted. Each RS publishes a locality index
> (hdfsBlocksLocalityIndex) that you can find via JMX or in their web
> UI, are they close to 100% or way down? Also which version are you on?
>
> > If I'm right, manually running major compactions against all of
> > the regions should resolve that, as the underlying data would all get
> > written locally.  Again, does that make sense?
>
> Major compacting would do that yes, but first check if you need it at
> all I think.
>
> J-D
>