The best metric at the moment is hbase.replication.sizeOfLogQueue
published through JMX. If your have Ganglia, opentsdb or Cacti you can
graph how many logs per server need to be replicated and then you'll
have a good idea of how much data needs to be replicated.
If it goes up to more than 2 per server for a few minutes, you know
you are either slowing down or someone is inserting a lot of data.
On Thu, Sep 13, 2012 at 1:18 PM, Neil Yalowitz <[EMAIL PROTECTED]> wrote:
> Hi all,
> I'm using HBase replication between two clusters running CDH3u3 and I
> recently noticed that a replicated column family was "lagging" by more than
> a day... that is, it required more than 24 hours for a Put to replicate
> from master to slave. The root cause of the lag appears to be swapping and
> other bad behavior.
> The real question I have is this: how do I know the state of replication at
> any given time? Does a large amount of data in /hbase/.logs indicate that
> replication is falling behind? What about /hbase/.oldlogs which seems to
> grow forever? What red flags should I look for to tell me that there is a
> problem with replication?
> Neil Yalowitz
> [EMAIL PROTECTED]