Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - infinite loop of RS_ZK_REGION_SPLIT on .94.2


Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
Matt Corgan 2012-11-03, 21:29
Here's a sample of the master's logs from yesterday.  It's not correlated
exactly with the other pastebin log, but there's 3GB of this from
yesterday: http://pastebin.com/wP2rNN1t

I'm am pushing the cluster a bit with importing data so testing the split
code harder than normal.  The regions are 500-1GB gzipped.  I can look into
it more but trying to figure out what to look for.

Thanks Ted,
Matt
On Sat, Nov 3, 2012 at 2:03 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Matt:
> This is the method which made the logging:
>   private static int tickleNodeSplit(ZooKeeperWatcher zkw,
>       HRegionInfo parent, HRegionInfo a, HRegionInfo b, ServerName
> serverName,
>       final int znodeVersion)
>   throws KeeperException, IOException {
>     byte [] payload = Writables.getBytes(a, b);
>     return ZKAssign.transitionNode(zkw, parent, serverName,
>       EventType.RS_ZK_REGION_SPLIT, EventType.RS_ZK_REGION_SPLIT,
>       znodeVersion, payload);
>   }
>
> transitionZKNode() calls tickleNodeSplit() when waiting for master to split
> the region. Obviously something caused the master not able to split.
>
> How large is the region ?
>
> Can you pastebin master log for that period of time ?
>
> Thanks
>
> On Sat, Nov 3, 2012 at 1:54 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>
> > We upgraded from .94.0 to .94.2 last week and have started to encounter
> > infinite loops of region-transition on splits.  I'm not sure yet if it's
> > all splits nor if it's related to load.  Solution so far has been to
> > restart the regionserver process.
> >
> > log snippet:
> > http://pastebin.com/LpienZ7B
> >
> > It's repeating these two lines:
> > 2012-11-02 01:35:33,312 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > regionserver:60020-0x13ab46479832b76 Attempting to transition node
> > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to
> > RS_ZK_REGION_SPLIT
> > 2012-11-02 01:35:33,364 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > regionserver:60020-0x13ab46479832b76 Successfully transitioned node
> > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to
> > RS_ZK_REGION_SPLIT
> >
> > with the occasional:
> > 2012-11-02 01:35:34,476 DEBUG
> > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on
> the
> > master to process the split for cf3e9bc069e1888983c06dc8e053ffcf
> >
> > Should the region transition from RS_ZK_REGION_SPLIT to itself?  It looks
> > wrong, but I'm not familiar with the code at all.
> >
> > Thanks,
> > Matt
> >
>