Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2


+
lars hofhansl 2012-11-04, 00:37
+
rajesh babu chintaguntla 2012-11-04, 02:10
+
Matt Corgan 2012-11-04, 02:27
+
Ted Yu 2012-11-04, 22:07
+
Matt Corgan 2012-11-05, 01:16
+
Jean-Daniel Cryans 2012-11-05, 17:52
+
Matt Corgan 2012-11-05, 22:02
+
Jean-Daniel Cryans 2012-11-05, 22:15
+
Matt Corgan 2012-11-06, 01:33
+
Ted Yu 2012-11-06, 03:07
+
Matt Corgan 2012-11-06, 05:10
+
lars hofhansl 2012-11-06, 05:17
+
Matt Corgan 2012-11-06, 05:28
+
lars hofhansl 2012-11-06, 05:45
+
ramkrishna vasudevan 2012-11-06, 05:59
+
ramkrishna vasudevan 2012-11-06, 06:30
+
Matt Corgan 2012-11-06, 08:12
+
ramkrishna vasudevan 2012-11-06, 10:07
+
ramkrishna vasudevan 2012-11-06, 11:29
+
lars hofhansl 2012-11-06, 15:13
Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
I dug back further to find the origin of e11e8b030897d6e5b973f8fe892e0eb2
to see if it had previous problems that left ZK in a bad state.  Here's the
regionserver and master logs: http://pastebin.com/qcvHjNCg from about 2
hours earlier.

* Nov 4, 22:34: region is created as daugher b of a split on node 159
* Nov 4, 22:35: moved to from node 159 to 169 by HBaseAdmin.move()
* Nov 5, 00:24: node 169 tries to split the region but gets Failed create
of ephemeral /hbase/unassigned/e11e8b030897d6e5b973f8fe892e0eb2

Is it possible that if something calls HBaseAdmin.move() on a daugher
region that is 30 seconds old, it could move the region but leave that ZK
node in a bad state?

On Tue, Nov 6, 2012 at 7:13 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Thanks Ram!
>
>
>
>
>
> ----- Original Message -----
> From: ramkrishna vasudevan <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Tuesday, November 6, 2012 3:29 AM
> Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>
> Raised HBASE-7103 for the same.
>
> Regards
> Ram
>
> On Tue, Nov 6, 2012 at 3:37 PM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > Thanks for the logs.
> > I found the reason.
> >
> > The following steps happen
> > -> Initially the parent region P1 starts splitting.
> > -> The split is going on normally.
> > -> Another split starts at the same time for the same region P1. (Not
> sure
> > why this started).
> > -> Rollback happens seeing an already existing node.
> > -> This node gets deleted in rollback and nodeDeleted Event starts.
> > -> In nodeDeleted event the RIT for the region P1 gets deleted.
> > -> Because of this there is no region in RIT.
> > -> Now the first split gets over.  Here the problem is we try to transit
> > the node to SPLITTING to SPLIT. But the node even does not exist.
> > But we don take any action on this.  We think it is successful.
> > -> Even before HBASE-6854 this could have happened.  Will file a JIRA for
> > the same.
> >
> > Regards
> > Ram
> >
> > On Tue, Nov 6, 2012 at 1:42 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> >
> >> Ram, here's the master log corresponding to
> http://pastebin.com/cSdMbA2a.
> >> Looks like e11e8b030897d6e5b973f8fe892e0eb2 was splitting on the
> >> regionserver in question (node 169), so i'm guessing that's
> >> 22f8fa73d8af837410ca270f344f6bb8's mommy.
> >>
> >> btw - you can see my balancer kick in 45 seconds later (runs every 10
> >> minutes) here, but so far i think that's coincidence:
> >> 2012-11-05 00:25:29,893 INFO org.apache.hadoop.hbase.master.HMaster:
> >> BalanceSwitch=false
> >>
> >> I followed the trail of e11e8b030897d6e5b973f8fe892e0eb2 back to node
> 169
> >> and found all this stuff about a failed split:
> >> http://pastebin.com/xtXMZ388 and
> >> an attempted rollback.  Looks like it errors out when it goes to put a
> >> node
> >> in ZK but it's already there.  I'm not familiar with what a good split
> log
> >> looks like, so i'll stop commenting for now...
> >>
> >>
> >> On Mon, Nov 5, 2012 at 10:30 PM, ramkrishna vasudevan <
> >> [EMAIL PROTECTED]> wrote:
> >>
> >> > The log shows that the first time the region was transitioned to
> >> SPLITTING
> >> > even then it was not populated in the Master's memory.
> >> >
> >> > On Tue, Nov 6, 2012 at 11:29 AM, ramkrishna vasudevan <
> >> > [EMAIL PROTECTED]> wrote:
> >> >
> >> > > Could you attach the master logs at this time
> >> > > 2012-11-05 00:24:55?
> >> > >
> >> > > Regards
> >> > > Ram
> >> > >
> >> > > On Tue, Nov 6, 2012 at 11:15 AM, lars hofhansl <[EMAIL PROTECTED]
> >> > >wrote:
> >> > >
> >> > >> Took a brief look through all SPLIT related commits since 0.94.0...
> >> > Found
> >> > >> these:
> >> > >>
> >> > >> HBASE-6854 *
> >> > >> HBASE-6713
> >> > >> HBASE-6329 *
> >> > >>
> >> > >> HBASE-6088
> >> > >>
> >> > >> HBASE-5986
> >> > >> HBASE-6070 *
> >> > >>
> >> > >>
> >> > >> The ones marked with * are (IMHO) more likely to be related.
+
Matt Corgan 2012-11-05, 01:20
+
ramkrishna vasudevan 2012-11-04, 17:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB