|
|
-
hmaster down cause by zookeeper?
Xiang Hua 2012-10-11, 19:14
Hi, we using CDH3, using zookeeper,hdfs and hbase. now 2 hmaster is down, we find some error message from zookeeper, see below:
2012-10-12 02:02:53,928 - ERROR [CommitProcessor:1:NIOServerCnxn@445 ] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2012-10-12 02:02:53,928 - ERROR [CommitProcessor:1:NIOServerCnxn@445 ] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2:31:34
any reconmends?
Thanks!
beatls
+
Xiang Hua 2012-10-11, 19:14
-
答复: hmaster down cause by zookeeper?
谢良 2012-10-12, 04:16
Hi Xiang,
It's not the root cause, if you skim through sendBuffer impl in NIOServerCnxn.java, you'll find there's a catch statement finally to log all exception, no throw again.
IMHO, the hbase master log file is the right place you need to dive:) ________________________________________ 发件人: Xiang Hua [[EMAIL PROTECTED]] 发送时间: 2012年10月12日 3:14 收件人: [EMAIL PROTECTED] 主题: hmaster down cause by zookeeper?
Hi, we using CDH3, using zookeeper,hdfs and hbase. now 2 hmaster is down, we find some error message from zookeeper, see below:
2012-10-12 02:02:53,928 - ERROR [CommitProcessor:1:NIOServerCnxn@445 ] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2012-10-12 02:02:53,928 - ERROR [CommitProcessor:1:NIOServerCnxn@445 ] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2:31:34
any reconmends?
Thanks!
beatls
-
Re: 答复: hmaster down cause by zookeeper?
Xiang Hua 2012-10-13, 01:55
Hi, below is hbase log, and connection problem with ZK. please help me find if there is some problem? 2012-10-12 00:00:19,582 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181 2012-10-12 00:00:19,583 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181, initiating session 2012-10-12 00:00:19,584 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181, sessionid = 0x139c539bc090878, negotiated timeout = 40000 2012-10-12 00:00:19,588 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows 2012-10-12 00:00:19,589 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@1c0ec83c; hsa=bj-ecsxhm4f3I-r3-7-r810-2-hbase-stor-7:60020 2012-10-12 00:00:19,591 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Cached location for .META.,,1.1028785192 is bj-ecsxhm4f3I-r3-8-r810-4-hbase-stor-9:60020 2012-10-12 00:00:19,594 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: The connection to hconnection-0x139c539bc090878 has been closed. 2012-10-12 00:00:19,594 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x139c539bc090878 2012-10-12 00:00:19,595 INFO org.apache.zookeeper.ZooKeeper: Session: 0x139c539bc090878 closed 2012-10-12 00:00:19,595 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-10-12 00:01:37,861 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 4 catalog row(s) and gc'd 0 unreferenced parent region(s) 2012-10-12 00:01:37,888 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: bj-ecsxhm4f3I-r3-8-r810-4-hbase-stor-9,60020,1347901025670=1, bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3,60020,1347901025664=1, bj-ecsxhm4f3I-r3-8-r810-3-hbase-stor-10,60020,1347901025664=0, bj-ecsxhm4f3I-r3-7-r810-1-hbase-stor-8,60020,1347901025661=1, bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1,60020,1347901025671=1, bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6,60020,1347901025662=0, bj-ecsxhm4f3I-r3-7-r810-2-hbase-stor-7,60020,1347901025661=1, bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673=1 2012-10-12 00:01:37,889 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing. servers=8 regions=6 average=0.75 mostloaded=1 leastloaded=0 2012-10-12 00:02:37,993 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1:2181,bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3:2181,bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2:2181 sessionTimeout=180000 watcher=hconnection 2012-10-12 00:02:37,994 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181 2012-10-12 00:02:37,995 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, initiating session 2012-10-12 00:02:37,996 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, sessionid = 0x339c539ba641db9, negotiated timeout = 40000 2012-10-12 00:02:38,000 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows 2012-10-12 00:02:38,000 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@41d55252; hsa=bj-ecsxhm4f3I-r3-7-r810-2-hbase-stor-7:60020 2012-10-12 00:02:38,002 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Cached location for .META.,,1.1028785192 is bj-ecsxhm4f3I-r3-8-r810-4-hbase-stor-9:60020 2012-10-12 00:02:38,004 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: The connection to hconnection-0x339c539ba641db9 has been closed. 2012-10-12 00:02:38,004 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x339c539ba641db9 2012-10-12 00:02:38,005 INFO org.apache.zookeeper.ZooKeeper: Session: 0x339c539ba641db9 closed 2012-10-12 00:02:38,005 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-10-12 00:04:34,331 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1:2181,bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3:2181,bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2:2181 sessionTimeout=180000 watcher=hconnection 2012-10-12 00:04:34,332 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1/ 10.20.16.32:2181 2012-10-12 00:04:34,332 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1/ 10.20.16.32:2181, initiating session 2012-10-12 00:04:34,337 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1/ 10.20.16.32:2181, sessionid = 0x239c539ba632f30, negotiated timeout = 40000 2012-10-12 00:04:34,342 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows 2012-10-12 00:04:34,342 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@6a9d547c; hsa=bj-ecsxhm4f3I-r3-7-r810-2-hbase-stor-7:60020 2012-10-12 00:04:34,343 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Cached location for .META.,,1.1028785192 is bj-ecsxhm4
+
Xiang Hua 2012-10-13, 01:55
|
|