Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> infinite loop of RS_ZK_REGION_SPLIT on .94.2


Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
Here's a sample of the master's logs from yesterday.  It's not correlated
exactly with the other pastebin log, but there's 3GB of this from
yesterday: http://pastebin.com/wP2rNN1t

I'm am pushing the cluster a bit with importing data so testing the split
code harder than normal.  The regions are 500-1GB gzipped.  I can look into
it more but trying to figure out what to look for.

Thanks Ted,
Matt
On Sat, Nov 3, 2012 at 2:03 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Matt:
> This is the method which made the logging:
>   private static int tickleNodeSplit(ZooKeeperWatcher zkw,
>       HRegionInfo parent, HRegionInfo a, HRegionInfo b, ServerName
> serverName,
>       final int znodeVersion)
>   throws KeeperException, IOException {
>     byte [] payload = Writables.getBytes(a, b);
>     return ZKAssign.transitionNode(zkw, parent, serverName,
>       EventType.RS_ZK_REGION_SPLIT, EventType.RS_ZK_REGION_SPLIT,
>       znodeVersion, payload);
>   }
>
> transitionZKNode() calls tickleNodeSplit() when waiting for master to split
> the region. Obviously something caused the master not able to split.
>
> How large is the region ?
>
> Can you pastebin master log for that period of time ?
>
> Thanks
>
> On Sat, Nov 3, 2012 at 1:54 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>
> > We upgraded from .94.0 to .94.2 last week and have started to encounter
> > infinite loops of region-transition on splits.  I'm not sure yet if it's
> > all splits nor if it's related to load.  Solution so far has been to
> > restart the regionserver process.
> >
> > log snippet:
> > http://pastebin.com/LpienZ7B
> >
> > It's repeating these two lines:
> > 2012-11-02 01:35:33,312 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > regionserver:60020-0x13ab46479832b76 Attempting to transition node
> > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to
> > RS_ZK_REGION_SPLIT
> > 2012-11-02 01:35:33,364 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > regionserver:60020-0x13ab46479832b76 Successfully transitioned node
> > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to
> > RS_ZK_REGION_SPLIT
> >
> > with the occasional:
> > 2012-11-02 01:35:34,476 DEBUG
> > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on
> the
> > master to process the split for cf3e9bc069e1888983c06dc8e053ffcf
> >
> > Should the region transition from RS_ZK_REGION_SPLIT to itself?  It looks
> > wrong, but I'm not familiar with the code at all.
> >
> > Thanks,
> > Matt
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB