Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2


+
lars hofhansl 2012-11-04, 00:37
+
rajesh babu chintaguntla 2012-11-04, 02:10
+
Matt Corgan 2012-11-04, 02:27
+
Ted Yu 2012-11-04, 22:07
+
Matt Corgan 2012-11-05, 01:16
Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
This reminds me a lot of https://issues.apache.org/jira/browse/HBASE-4792

What I'd like to see is the log from the first time the master
receives the split message. I guess it says the region doesn't exist
anymore because the split was processed already in the master but
there's a failure mode similar to 4792.

I saw this on another cluster last week but the logs were rolling
based on size so the original split message was lost.

J-D

On Sun, Nov 4, 2012 at 5:16 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> I've had a few successful splits today without any errors.  I'll turn up
> the importer speed tomorrow to start stressing the cluster more.
>
>
> On Sun, Nov 4, 2012 at 2:07 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> Matt:
>> Was there any region server crash before you noticed the repeated log for
>> bc62a8a72124a4ba3f6b9f302587903c ?
>>
>> I wonder if the following JIRA might be related:
>> HBASE-7083 SSH#fixupDaughter should force re-assign missing daughter
>>
>> Thanks
>>
>> On Sat, Nov 3, 2012 at 7:27 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>>
>> > Strangely, I don't see any record of that region in the master before
>> what
>> > I already pasted even though I have logs back to 10/30.  Next time it
>> > happens I'll gather a full log record and try to debug while it's
>> > occurring.
>> >
>> >
>> > On Sat, Nov 3, 2012 at 7:10 PM, rajesh babu chintaguntla <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> > > Hi Matt,
>> > > can you paste some more master logs of region
>> > > bc62a8a72124a4ba3f6b9f30258790 before split.
>> > > I think Its not problem with splitting.
>> > > We are getting
>> > >       LOG.warn("Region " + encodedName + " not found on server " +
>> > > serverName +
>> > >         "; failed processing");
>> > > this log means no entry in servers map(not fully assigned).
>> > >     Set<HRegionInfo> hris = this.servers.get(sn);
>> > >     HRegionInfo foundHri = null;
>> > >     for (HRegionInfo hri: hris) {
>> > >       if (hri.getEncodedName().equals(encodedName)) {
>> > >         foundHri = hri;
>> > >         break;
>> > >       }
>> > >     }
>> > >     return foundHri;
>> > >
>> > >
>> > >
>> > >
>> > > On Sun, Nov 4, 2012 at 6:07 AM, lars hofhansl <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > CC'ing dev list...
>> > > >
>> > > > Is anybody aware of any changes that went in recently that could
>> cause
>> > > > this?
>> > > > I looked around a bit, but could not find anything obvious.
>> > > >
>> > > > -- Lars
>> > > >
>> > > >
>> > > >
>> > > > ________________________________
>> > > >  From: Matt Corgan <[EMAIL PROTECTED]>
>> > > > To: user <[EMAIL PROTECTED]>
>> > > > Sent: Saturday, November 3, 2012 5:27 PM
>> > > > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>> > > >
>> > > > I think the cluster is ok without running hbck, as restarting the
>> > > > regionserver process stops the warnings and everything looks ok
>> > > otherwise.
>> > > >
>> > > > here's the regionserver right after the split happens:
>> > > > ------------------------
>> > > > 2012-11-01 22:45:28,726 DEBUG
>> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > regionserver:60020-0x13ab46479832953 Attempting to transition node
>> > > > bc62a8a72124a4ba3f6b9f302587903c from *RS_ZK_R*
>> > > > *EGION_SPLITTING to RS_ZK_REGION_SPLIT*
>> > > > 2012-11-01 22:45:28,730 DEBUG
>> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > regionserver:60020-0x13ab46479832953 Successfully transitioned node
>> > > > bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_
>> > > > REGION_SPLITTING to RS_ZK_REGION_SPLIT
>> > > > 2012-11-01 22:45:28,730 DEBUG
>> > > > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting
>> on
>> > > the
>> > > > master to process the split for bc62a8a72124a4ba3f6b9f302587903c
>> > > > 2012-11-01 22:45:28,832 DEBUG
>> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > regionserver:60020-0x13ab46479832953 Attempting to transition node
>> > > > bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_R
+
Matt Corgan 2012-11-05, 22:02
+
Jean-Daniel Cryans 2012-11-05, 22:15
+
Matt Corgan 2012-11-06, 01:33
+
Ted Yu 2012-11-06, 03:07
+
Matt Corgan 2012-11-06, 05:10
+
lars hofhansl 2012-11-06, 05:17
+
Matt Corgan 2012-11-06, 05:28
+
lars hofhansl 2012-11-06, 05:45
+
ramkrishna vasudevan 2012-11-06, 05:59
+
ramkrishna vasudevan 2012-11-06, 06:30
+
Matt Corgan 2012-11-06, 08:12
+
ramkrishna vasudevan 2012-11-06, 10:07
+
ramkrishna vasudevan 2012-11-06, 11:29
+
lars hofhansl 2012-11-06, 15:13
+
Matt Corgan 2012-11-06, 23:36
+
Matt Corgan 2012-11-05, 01:20
+
ramkrishna vasudevan 2012-11-04, 17:12