|
|
-
Zookeeper issue? (ConnectionLoss for /hbase/hbaseid)
Jean-Marc Spaggiari 2012-11-15, 23:19
Hi,
I ran a 30h MapReduce job, and now I'm not able to connect anymore to my HBase cluster.
The MapReduce was configured on ReadOnly mode. So only the log table received data. Everything else was just ready.
Today I killed the job to replace one of the servers which is too slow and now I'm not able to connect to ZooKeeper anymore.
Below is the stack trace, and at the bottom is the ZKDump.
I think if I restart everything it should be working, but I'm wondering if there is any information on this situation which might help to prevent this to happend in the futur? I might be able to reproduce that since I can re-run the job almost anytime.
I don't seems to have to many connections. The HBase shell is replying correctly. It's really only the Java application using ZooKeeper which is not working.
I'm running HBase 0.94.2, ZK 3.4.3, Hadoop 1.0.3 all installed separatly.
JM
2012-11-15 18:09:33,684 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:33,810 [main] WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid 2012-11-15 18:09:34,800 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:35,902 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:36,003 [main] WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid 2012-11-15 18:09:37,004 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:38,106 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:39,209 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:40,311 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:40,412 [main] WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid 2012-11-15 18:09:41,414 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refuse at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) 2012-11-15 18:09:42,516 [main-SendThread(cube:21818)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null,
+
Jean-Marc Spaggiari 2012-11-15, 23:19
-
Re: Zookeeper issue? (ConnectionLoss for /hbase/hbaseid)
Jean-Marc Spaggiari 2012-11-15, 23:41
HBase stop/start did not helped... ZooKeeper stop/start did not helped..
I reduced the code to the minimum: config = HBaseConfiguration.create(); config.set("hbase.zookeeper.quorum", "cube"); new HBaseAdmin(config);
But still getting the exception.
On the same machine, bin/hbase zkcli is working fine giving me a result to get /hbase. bin/hbase shell is working fine too.
I'm a bit lost here. I tryied to google the issue with no much results. I will continue to dig...
JM
2012/11/15, Jean-Marc Spaggiari <[EMAIL PROTECTED]>: > Hi, > > I ran a 30h MapReduce job, and now I'm not able to connect anymore to > my HBase cluster. > > The MapReduce was configured on ReadOnly mode. So only the log table > received data. Everything else was just ready. > > Today I killed the job to replace one of the servers which is too slow > and now I'm not able to connect to ZooKeeper anymore. > > Below is the stack trace, and at the bottom is the ZKDump. > > I think if I restart everything it should be working, but I'm > wondering if there is any information on this situation which might > help to prevent this to happend in the futur? I might be able to > reproduce that since I can re-run the job almost anytime. > > I don't seems to have to many connections. The HBase shell is replying > correctly. It's really only the Java application using ZooKeeper which > is not working. > > I'm running HBase 0.94.2, ZK 3.4.3, Hadoop 1.0.3 all installed separatly. > > JM > > 2012-11-15 18:09:33,684 [main-SendThread(cube:21818)] WARN > org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connexion refuse > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > 2012-11-15 18:09:33,810 [main] WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly > transient ZooKeeper exception: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid > 2012-11-15 18:09:34,800 [main-SendThread(cube:21818)] WARN > org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connexion refuse > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > 2012-11-15 18:09:35,902 [main-SendThread(cube:21818)] WARN > org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connexion refuse > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > 2012-11-15 18:09:36,003 [main] WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly > transient ZooKeeper exception: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid > 2012-11-15 18:09:37,004 [main-SendThread(cube:21818)] WARN > org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connexion refuse > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
+
Jean-Marc Spaggiari 2012-11-15, 23:41
-
Re: Zookeeper issue? (ConnectionLoss for /hbase/hbaseid)
Jean-Marc Spaggiari 2012-11-16, 01:44
So. I have restarted everything and I found the issue...
I ran the code above from Eclipse where I have HBase trunk. Cluster is running with HBase 0.94.2.
Conclusion: HBase Trunk is not compatible with 0.94.2?
Anyway, now it's working again...
JM
2012/11/15, Jean-Marc Spaggiari <[EMAIL PROTECTED]>: > HBase stop/start did not helped... > ZooKeeper stop/start did not helped.. > > I reduced the code to the minimum: > config = HBaseConfiguration.create(); > config.set("hbase.zookeeper.quorum", "cube"); > new HBaseAdmin(config); > > But still getting the exception. > > On the same machine, bin/hbase zkcli is working fine giving me a > result to get /hbase. > bin/hbase shell is working fine too. > > I'm a bit lost here. I tryied to google the issue with no much > results. I will continue to dig... > > JM > > 2012/11/15, Jean-Marc Spaggiari <[EMAIL PROTECTED]>: >> Hi, >> >> I ran a 30h MapReduce job, and now I'm not able to connect anymore to >> my HBase cluster. >> >> The MapReduce was configured on ReadOnly mode. So only the log table >> received data. Everything else was just ready. >> >> Today I killed the job to replace one of the servers which is too slow >> and now I'm not able to connect to ZooKeeper anymore. >> >> Below is the stack trace, and at the bottom is the ZKDump. >> >> I think if I restart everything it should be working, but I'm >> wondering if there is any information on this situation which might >> help to prevent this to happend in the futur? I might be able to >> reproduce that since I can re-run the job almost anytime. >> >> I don't seems to have to many connections. The HBase shell is replying >> correctly. It's really only the Java application using ZooKeeper which >> is not working. >> >> I'm running HBase 0.94.2, ZK 3.4.3, Hadoop 1.0.3 all installed separatly. >> >> JM >> >> 2012-11-15 18:09:33,684 [main-SendThread(cube:21818)] WARN >> org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, >> unexpected error, closing socket connection and attempting reconnect >> java.net.ConnectException: Connexion refuse >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) >> at >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) >> 2012-11-15 18:09:33,810 [main] WARN >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly >> transient ZooKeeper exception: >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid >> 2012-11-15 18:09:34,800 [main-SendThread(cube:21818)] WARN >> org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, >> unexpected error, closing socket connection and attempting reconnect >> java.net.ConnectException: Connexion refuse >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) >> at >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) >> 2012-11-15 18:09:35,902 [main-SendThread(cube:21818)] WARN >> org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, >> unexpected error, closing socket connection and attempting reconnect >> java.net.ConnectException: Connexion refuse >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) >> at >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) >> 2012-11-15 18:09:36,003 [main] WARN >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly >> transient ZooKeeper exception: >> org.apache.zookeeper.KeeperException$ConnectionLossException:
+
Jean-Marc Spaggiari 2012-11-16, 01:44
-
Re: Zookeeper issue? (ConnectionLoss for /hbase/hbaseid)
Stack 2012-11-16, 07:35
On Thu, Nov 15, 2012 at 5:44 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote: > So. I have restarted everything and I found the issue... > > I ran the code above from Eclipse where I have HBase trunk. Cluster is > running with HBase 0.94.2. > > Conclusion: HBase Trunk is not compatible with 0.94.2? >
That is the case indeed.
Regards 30h MR jobs, they always fail in the 29th hour and 59th minute in my experience. Would suggest that if you can, cut the MR job up into smaller pieces.... so if any fails, there is less to do over.
Good luck, St.Ack
+
Stack 2012-11-16, 07:35
|
|