-Re: Question about compactions
Jean-Daniel Cryans 2013-03-21, 17:49
On Thu, Mar 21, 2013 at 6:46 AM, Brennon Church <[EMAIL PROTECTED]> wrote:
> Hello all,
> As I understand it, a common performance tweak is to disable major
> compactions so that you don't end up with storms taking things out at
> inconvenient times. I'm thinking that I should just write a quick script to
> rotate through all of our regions, one at a time, and compact them. Again,
> if I'm understanding this correctly we should not end up with storms as
> they'll only happen one at a time, and each one doesn't run for long. Does
> that seem reasonable, or am I missing something? My hope is to run the
> script regularly.
FWIW major compacting isn't even needed if you don't update or delete
cells so do consider that too.
The problem with scheduling major compactions yourself is that, since
the command is async, you can still end up with a storm of compactions
if you just blindly issue major_compact for all your regions. Things
like adding wait time works but then let's say you want the
compactions to run only between 2 and 4AM then you can run out of
time. What I have seen to circumvent this is to only do a subset of
the regions at a time. You can also use JMX to monitor the compaction
queue on each RS and make sure you are not just piling them up, but
this requires some more work.
> Corollary question... I recently added drives to our nodes and since I did
> this while they were all still running, basically just restarting the
> datanode underneath to pick up the new spindles, I'm fairly sure I've thrown
> data locality out the window, based on the changed pattern of network
Interesting but unlikely. Even restarting HBase shouldn't do that
unless it was wrongly restarted. Each RS publishes a locality index
(hdfsBlocksLocalityIndex) that you can find via JMX or in their web
UI, are they close to 100% or way down? Also which version are you on?
> If I'm right, manually running major compactions against all of
> the regions should resolve that, as the underlying data would all get
> written locally. Again, does that make sense?
Major compacting would do that yes, but first check if you need it at
all I think.