|
|
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2Ted Yu 2012-11-03, 21:03
Matt:
This is the method which made the logging: private static int tickleNodeSplit(ZooKeeperWatcher zkw, HRegionInfo parent, HRegionInfo a, HRegionInfo b, ServerName serverName, final int znodeVersion) throws KeeperException, IOException { byte [] payload = Writables.getBytes(a, b); return ZKAssign.transitionNode(zkw, parent, serverName, EventType.RS_ZK_REGION_SPLIT, EventType.RS_ZK_REGION_SPLIT, znodeVersion, payload); } transitionZKNode() calls tickleNodeSplit() when waiting for master to split the region. Obviously something caused the master not able to split. How large is the region ? Can you pastebin master log for that period of time ? Thanks On Sat, Nov 3, 2012 at 1:54 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > We upgraded from .94.0 to .94.2 last week and have started to encounter > infinite loops of region-transition on splits. I'm not sure yet if it's > all splits nor if it's related to load. Solution so far has been to > restart the regionserver process. > > log snippet: > http://pastebin.com/LpienZ7B > > It's repeating these two lines: > 2012-11-02 01:35:33,312 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x13ab46479832b76 Attempting to transition node > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to > RS_ZK_REGION_SPLIT > 2012-11-02 01:35:33,364 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x13ab46479832b76 Successfully transitioned node > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to > RS_ZK_REGION_SPLIT > > with the occasional: > 2012-11-02 01:35:34,476 DEBUG > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the > master to process the split for cf3e9bc069e1888983c06dc8e053ffcf > > Should the region transition from RS_ZK_REGION_SPLIT to itself? It looks > wrong, but I'm not familiar with the code at all. > > Thanks, > Matt > |