Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2


Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
Raised HBASE-7103 for the same.

Regards
Ram

On Tue, Nov 6, 2012 at 3:37 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> Thanks for the logs.
> I found the reason.
>
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over.  Here the problem is we try to transit
> the node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this.  We think it is successful.
> -> Even before HBASE-6854 this could have happened.  Will file a JIRA for
> the same.
>
> Regards
> Ram
>
> On Tue, Nov 6, 2012 at 1:42 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>
>> Ram, here's the master log corresponding to http://pastebin.com/cSdMbA2a.
>> Looks like e11e8b030897d6e5b973f8fe892e0eb2 was splitting on the
>> regionserver in question (node 169), so i'm guessing that's
>> 22f8fa73d8af837410ca270f344f6bb8's mommy.
>>
>> btw - you can see my balancer kick in 45 seconds later (runs every 10
>> minutes) here, but so far i think that's coincidence:
>> 2012-11-05 00:25:29,893 INFO org.apache.hadoop.hbase.master.HMaster:
>> BalanceSwitch=false
>>
>> I followed the trail of e11e8b030897d6e5b973f8fe892e0eb2 back to node 169
>> and found all this stuff about a failed split:
>> http://pastebin.com/xtXMZ388 and
>> an attempted rollback.  Looks like it errors out when it goes to put a
>> node
>> in ZK but it's already there.  I'm not familiar with what a good split log
>> looks like, so i'll stop commenting for now...
>>
>>
>> On Mon, Nov 5, 2012 at 10:30 PM, ramkrishna vasudevan <
>> [EMAIL PROTECTED]> wrote:
>>
>> > The log shows that the first time the region was transitioned to
>> SPLITTING
>> > even then it was not populated in the Master's memory.
>> >
>> > On Tue, Nov 6, 2012 at 11:29 AM, ramkrishna vasudevan <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> > > Could you attach the master logs at this time
>> > > 2012-11-05 00:24:55?
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > > On Tue, Nov 6, 2012 at 11:15 AM, lars hofhansl <[EMAIL PROTECTED]
>> > >wrote:
>> > >
>> > >> Took a brief look through all SPLIT related commits since 0.94.0...
>> > Found
>> > >> these:
>> > >>
>> > >> HBASE-6854 *
>> > >> HBASE-6713
>> > >> HBASE-6329 *
>> > >>
>> > >> HBASE-6088
>> > >>
>> > >> HBASE-5986
>> > >> HBASE-6070 *
>> > >>
>> > >>
>> > >> The ones marked with * are (IMHO) more likely to be related.
>> > >>
>> > >> -- Lars
>> > >>
>> > >> ________________________________
>> > >> From: Matt Corgan <[EMAIL PROTECTED]>
>> > >> To: dev <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
>> > >> Sent: Monday, November 5, 2012 9:28 PM
>> > >> Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>> > >>
>> > >> Yeah - we were running .94.0 since it came out but never saw it
>> there.
>> > >> I'll keep trying to narrow it down.  The only harm it's causing is
>> log
>> > >> spam and failing to move daughters to a new regionserver, which are
>> > >> definitely problems, but it's not bringing down the cluster.
>> > >>
>> > >>
>> > >> On Mon, Nov 5, 2012 at 9:17 PM, lars hofhansl <[EMAIL PROTECTED]>
>> > >> wrote:
>> > >>
>> > >> > So it seems you can repeat this to some extend in 0.94.2, but you
>> have
>> > >> > never seen this before?
>> > >> >
>> > >> >
>> > >> > -- Lars
>> > >> >
>> > >> >
>> > >> >
>> > >> > ________________________________
>> > >> >  From: Matt Corgan <[EMAIL PROTECTED]>
>> > >> > To: dev <[EMAIL PROTECTED]>
>> > >> > Sent: Monday, November 5, 2012 9:10 PM
>> > >> > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB