Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2


Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
Ram, here's the master log corresponding to http://pastebin.com/cSdMbA2a.
Looks like e11e8b030897d6e5b973f8fe892e0eb2 was splitting on the
regionserver in question (node 169), so i'm guessing that's
22f8fa73d8af837410ca270f344f6bb8's mommy.

btw - you can see my balancer kick in 45 seconds later (runs every 10
minutes) here, but so far i think that's coincidence:
2012-11-05 00:25:29,893 INFO org.apache.hadoop.hbase.master.HMaster:
BalanceSwitch=false

I followed the trail of e11e8b030897d6e5b973f8fe892e0eb2 back to node 169
and found all this stuff about a failed split: http://pastebin.com/xtXMZ388 and
an attempted rollback.  Looks like it errors out when it goes to put a node
in ZK but it's already there.  I'm not familiar with what a good split log
looks like, so i'll stop commenting for now...
On Mon, Nov 5, 2012 at 10:30 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> The log shows that the first time the region was transitioned to SPLITTING
> even then it was not populated in the Master's memory.
>
> On Tue, Nov 6, 2012 at 11:29 AM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > Could you attach the master logs at this time
> > 2012-11-05 00:24:55?
> >
> > Regards
> > Ram
> >
> > On Tue, Nov 6, 2012 at 11:15 AM, lars hofhansl <[EMAIL PROTECTED]
> >wrote:
> >
> >> Took a brief look through all SPLIT related commits since 0.94.0...
> Found
> >> these:
> >>
> >> HBASE-6854 *
> >> HBASE-6713
> >> HBASE-6329 *
> >>
> >> HBASE-6088
> >>
> >> HBASE-5986
> >> HBASE-6070 *
> >>
> >>
> >> The ones marked with * are (IMHO) more likely to be related.
> >>
> >> -- Lars
> >>
> >> ________________________________
> >> From: Matt Corgan <[EMAIL PROTECTED]>
> >> To: dev <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
> >> Sent: Monday, November 5, 2012 9:28 PM
> >> Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
> >>
> >> Yeah - we were running .94.0 since it came out but never saw it there.
> >> I'll keep trying to narrow it down.  The only harm it's causing is log
> >> spam and failing to move daughters to a new regionserver, which are
> >> definitely problems, but it's not bringing down the cluster.
> >>
> >>
> >> On Mon, Nov 5, 2012 at 9:17 PM, lars hofhansl <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > So it seems you can repeat this to some extend in 0.94.2, but you have
> >> > never seen this before?
> >> >
> >> >
> >> > -- Lars
> >> >
> >> >
> >> >
> >> > ________________________________
> >> >  From: Matt Corgan <[EMAIL PROTECTED]>
> >> > To: dev <[EMAIL PROTECTED]>
> >> > Sent: Monday, November 5, 2012 9:10 PM
> >> > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
> >> >
> >> > It happened in this new table that I have all the logs for.  The
> region
> >> in
> >> > question this time is 6839663e4f8f79be3d7469784c21cbc2, and the first
> >> trace
> >> > of this region is on the regionserver with the "Intantiated
> >> tableName..."
> >> > message
> >> >
> >> > 2012-11-05 22:24:21,162 DEBUG
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> > Instantiated
> >> >
> >> >
> >>
> StatAreaModelLink,\x00\x00\x07\xD9\x00\x00\x00\x0C\x00\x00\x00\x004H\xC4\xB5\x00\x00\x00\x02\x00\x00\x00\x05\x00\x00\x00\x00G.l\x9B,1352172257535.6839663e4f8f79be3d74
> >> > 9784c21cbc2.
> >> >
> >> > I also know this region came from a recent split, but I can't find any
> >> log
> >> > messages show the parent finishing the split that created this
> daughter
> >> > region.  So my guess now is that the split is actually finishing and
> >> > letting clients continue to write data, but something is failing to
> >> print
> >> > the log line and correctly tell the master about the new region.
> >> >
> >> > I've noticed that these regions are showing up on the clients in calls
> >> to
> >> > HTable.getRegionLocations(), so the clients continue to function, but
> >> if I
> >> > call HBaseAdmin.move() i get an UnknownRegionException.
> >> >
> >> >
> >> > On Mon, Nov 5, 2012 at 7:07 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB