Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2


Copy link to this message
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
lars hofhansl 2012-11-04, 00:37
CC'ing dev list...

Is anybody aware of any changes that went in recently that could cause this?
I looked around a bit, but could not find anything obvious.

-- Lars

________________________________
 From: Matt Corgan <[EMAIL PROTECTED]>
To: user <[EMAIL PROTECTED]>
Sent: Saturday, November 3, 2012 5:27 PM
Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
 
I think the cluster is ok without running hbck, as restarting the
regionserver process stops the warnings and everything looks ok otherwise.

here's the regionserver right after the split happens:
------------------------
2012-11-01 22:45:28,726 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13ab46479832953 Attempting to transition node
bc62a8a72124a4ba3f6b9f302587903c from *RS_ZK_R*
*EGION_SPLITTING to RS_ZK_REGION_SPLIT*
2012-11-01 22:45:28,730 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13ab46479832953 Successfully transitioned node
bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_
REGION_SPLITTING to RS_ZK_REGION_SPLIT
2012-11-01 22:45:28,730 DEBUG
org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the
master to process the split for bc62a8a72124a4ba3f6b9f302587903c
2012-11-01 22:45:28,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13ab46479832953 Attempting to transition node
bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_R
EGION_SPLIT to RS_ZK_REGION_SPLIT
2012-11-01 22:45:28,837 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13ab46479832953 Successfully transitioned node
bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_
REGION_SPLIT to RS_ZK_REGION_SPLIT
-----------------------------

The "transitioned node from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT"
continues for 15 or so hours and finally settles without manual
intervention with these regionserver log messages:
-----------------------
2012-11-02 13:55:00,906 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13ab46479832953 Attempting to transition node *
bc62a8a72124a4ba3f6b9f302587903c* from RS_ZK_REGION_SPLIT to
RS_ZK_REGION_SPLIT

2012-11-02 13:55:00,916 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13ab46479832953 Successfully transitioned node *
bc62a8a72124a4ba3f6b9f302587903c* from RS_ZK_REGION_SPLIT to
RS_ZK_REGION_SPLIT

2012-11-02 13:55:00,916 INFO
org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META
updated, and report to master.
Parent=ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.
*bc62a8a72124a4ba3f6b9f302587903c*., new regions:
ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351824327023.22c3fa48d17aa7312ca53566c680f0fd.,
ActiveListingRecord16,\x83\x07\xDC\x07\x11WebsiteIDX\x009024215,1351824327023.b0e0a488c711e5c7f74ee6198a9755a2..
Split took 15hrs, 9mins, 33sec

2012-11-02 13:55:00,945 INFO
org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META
updated, and report to master.
Parent=ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351790329631.f720436a6f8fd412d76fe3255f24e3b3.,
new regions:
ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351816858693.2880cf893175d2a852947c63ee8554a3.,
ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.*
bc62a8a72124a4ba3f6b9f302587903c*.. Split took 17hrs, 14mins, 2sec
----------------------

And the master finally logs this:
-----------------------
2012-11-02 13:55:00,783 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Region
bc62a8a72124a4ba3f6b9f302587903c not found on server
HadoopNode162.hotpads.srv,60020,1351788248279; failed processing

2012-11-02 13:55:00,783 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region
bc62a8a72124a4ba3f6b9f302587903c from server
HadoopNode162.hotpads.srv,60020,1351788248279 but it doesn't exist anymore,
probably already processed its split
------------------------

I can't find any evidence that the region left the node during that time.
I'll have to catch it in action next time and see what the region is up to
during the problem period.

What does it mean that it successfully transitioned from SPLIT to SPLIT?
Is that a valid transition?

Matt

On Sat, Nov 3, 2012 at 2:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote: