|
|
-
Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2Matt Corgan 2012-11-04, 00:27
I think the cluster is ok without running hbck, as restarting the
regionserver process stops the warnings and everything looks ok otherwise. here's the regionserver right after the split happens: ------------------------ 2012-11-01 22:45:28,726 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13ab46479832953 Attempting to transition node bc62a8a72124a4ba3f6b9f302587903c from *RS_ZK_R* *EGION_SPLITTING to RS_ZK_REGION_SPLIT* 2012-11-01 22:45:28,730 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13ab46479832953 Successfully transitioned node bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_ REGION_SPLITTING to RS_ZK_REGION_SPLIT 2012-11-01 22:45:28,730 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for bc62a8a72124a4ba3f6b9f302587903c 2012-11-01 22:45:28,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13ab46479832953 Attempting to transition node bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_R EGION_SPLIT to RS_ZK_REGION_SPLIT 2012-11-01 22:45:28,837 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13ab46479832953 Successfully transitioned node bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_ REGION_SPLIT to RS_ZK_REGION_SPLIT ----------------------------- The "transitioned node from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT" continues for 15 or so hours and finally settles without manual intervention with these regionserver log messages: ----------------------- 2012-11-02 13:55:00,906 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13ab46479832953 Attempting to transition node * bc62a8a72124a4ba3f6b9f302587903c* from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT 2012-11-02 13:55:00,916 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13ab46479832953 Successfully transitioned node * bc62a8a72124a4ba3f6b9f302587903c* from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT 2012-11-02 13:55:00,916 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META updated, and report to master. Parent=ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693. *bc62a8a72124a4ba3f6b9f302587903c*., new regions: ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351824327023.22c3fa48d17aa7312ca53566c680f0fd., ActiveListingRecord16,\x83\x07\xDC\x07\x11WebsiteIDX\x009024215,1351824327023.b0e0a488c711e5c7f74ee6198a9755a2.. Split took 15hrs, 9mins, 33sec 2012-11-02 13:55:00,945 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META updated, and report to master. Parent=ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351790329631.f720436a6f8fd412d76fe3255f24e3b3., new regions: ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351816858693.2880cf893175d2a852947c63ee8554a3., ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.* bc62a8a72124a4ba3f6b9f302587903c*.. Split took 17hrs, 14mins, 2sec ---------------------- And the master finally logs this: ----------------------- 2012-11-02 13:55:00,783 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region bc62a8a72124a4ba3f6b9f302587903c not found on server HadoopNode162.hotpads.srv,60020,1351788248279; failed processing 2012-11-02 13:55:00,783 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region bc62a8a72124a4ba3f6b9f302587903c from server HadoopNode162.hotpads.srv,60020,1351788248279 but it doesn't exist anymore, probably already processed its split ------------------------ I can't find any evidence that the region left the node during that time. I'll have to catch it in action next time and see what the region is up to during the problem period. What does it mean that it successfully transitioned from SPLIT to SPLIT? Is that a valid transition? Matt On Sat, Nov 3, 2012 at 2:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Matt: > From the following we can see that region bc62a8a72124a4ba3f6b9f302587903c |