|
yang.li
2012-10-17, 06:06
Jordan Zimmerman
2012-10-17, 06:24
yang.li
2012-10-17, 06:34
Patrick Hunt
2012-10-17, 06:35
Patrick Hunt
2012-10-17, 06:36
Ted Dunning
2012-10-17, 16:34
yang.li
2012-10-18, 01:29
Ted Dunning
2012-10-18, 01:58
|
-
Some thing is wrongyang.li 2012-10-17, 06:06
Hi, all: I'm in charge of a zookeeper cluster including six nodes. It worked well for the last six months, but yesterday when I want to list the children of a specific path "/dp/monitor_root/child/CDSkafkaSensor/msg", something is just wrong. Here is the dump info as shown below: [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg 2012-10-17 13:47:28,719 [myid:] - WARN [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - Session 0x63a6d4272590001 for server m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len5362775 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) WATCHER:: WatchedEvent state:Disconnected type:None path:null Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dp/monitor_root/child/CDSkafkaSensor/msg at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) I have tried many ways to solve this problem, but none of those works, even the "super user" method. Now neither can I list the node, nor can I delete it . So I really really need a help. Thank you! yang.li
-
Re: Some thing is wrongJordan Zimmerman 2012-10-17, 06:24
I'll bet that you have a ZNode that has too many children and your jute buffer is maxed out. The default maximum for zookeeper api calls is 1MB. You can easily surpass this with large ZNodes (10K plus children).
-Jordan On Oct 16, 2012, at 11:06 PM, yang.li <[EMAIL PROTECTED]> wrote: > > Hi, all: > > I'm in charge of a zookeeper cluster including six nodes. It worked well for the last six months, > but yesterday when I want to list the children of a specific path "/dp/monitor_root/child/CDSkafkaSensor/msg", > something is just wrong. Here is the dump info as shown below: > > [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg > 2012-10-17 13:47:28,719 [myid:] - WARN [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - Session 0x63a6d4272590001 for server m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Packet len5362775 is out of range! > at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > WATCHER:: > > WatchedEvent state:Disconnected type:None path:null > Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dp/monitor_root/child/CDSkafkaSensor/msg > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) > at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) > at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) > at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) > > I have tried many ways to solve this problem, but none of those works, even the "super user" method. > Now neither can I list the node, nor can I delete it . So I really really need a help. Thank you! > > > > yang.li
-
Re: Some thing is wrongyang.li 2012-10-17, 06:34
Thanks a lot, Jordan. Is there a config parameter to control the children number? Besides, how can I fix the situation now? I can't list the node, even worse, I can not delete it neither. There is a dead-branch in the whole node-tree, I just want to find a way to clean it up, any advise? yang.li 发件人: Jordan Zimmerman 发送时间: 2012-10-17 14:24 收件人: user 主题: Re: Some thing is wrong I'll bet that you have a ZNode that has too many children and your jute buffer is maxed out. The default maximum for zookeeper api calls is 1MB. You can easily surpass this with large ZNodes (10K plus children). -Jordan On Oct 16, 2012, at 11:06 PM, yang.li <[EMAIL PROTECTED]> wrote: > > Hi, all: > > I'm in charge of a zookeeper cluster including six nodes. It worked well for the last six months, > but yesterday when I want to list the children of a specific path "/dp/monitor_root/child/CDSkafkaSensor/msg", > something is just wrong. Here is the dump info as shown below: > > [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg > 2012-10-17 13:47:28,719 [myid:] - WARN [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - Session 0x63a6d4272590001 for server m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Packet len5362775 is out of range! > at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > WATCHER:: > > WatchedEvent state:Disconnected type:None path:null > Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dp/monitor_root/child/CDSkafkaSensor/msg > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) > at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) > at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) > at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) > > I have tried many ways to solve this problem, but none of those works, even the "super user" method. > Now neither can I list the node, nor can I delete it . So I really really need a help. Thank you! > > > > yang.li
-
Re: Some thing is wrongPatrick Hunt 2012-10-17, 06:35
Hi Yang Li, this is likely a straightforward issue - the ZK client
limits the size of the response that it will accept. By default this is around 4mb. Notice the error message: "Packet len 5362775 is out of range" basically what's happening is that your getChildren call is returning a huge packet size - likely you have a large number of children under "/dp/monitor_root/child/CDSkafkaSensor/msg". See this thread for a solution. Patrick On Tue, Oct 16, 2012 at 11:06 PM, yang.li <[EMAIL PROTECTED]> wrote: > > Hi, all: > > I'm in charge of a zookeeper cluster including six nodes. It worked well for the last six months, > but yesterday when I want to list the children of a specific path "/dp/monitor_root/child/CDSkafkaSensor/msg", > something is just wrong. Here is the dump info as shown below: > > [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg > 2012-10-17 13:47:28,719 [myid:] - WARN [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - Session 0x63a6d4272590001 for server m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Packet len5362775 is out of range! > at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > WATCHER:: > > WatchedEvent state:Disconnected type:None path:null > Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dp/monitor_root/child/CDSkafkaSensor/msg > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) > at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) > at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) > at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) > > I have tried many ways to solve this problem, but none of those works, even the "super user" method. > Now neither can I list the node, nor can I delete it . So I really really need a help. Thank you! > > > > yang.li
-
Re: Some thing is wrongPatrick Hunt 2012-10-17, 06:36
this one: http://markmail.org/message/dlik7ayqkq2mu2nf
On Tue, Oct 16, 2012 at 11:35 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > Hi Yang Li, this is likely a straightforward issue - the ZK client > limits the size of the response that it will accept. By default this > is around 4mb. Notice the error message: > > "Packet len 5362775 is out of range" > > basically what's happening is that your getChildren call is returning > a huge packet size - likely you have a large number of children under > "/dp/monitor_root/child/CDSkafkaSensor/msg". > > See this thread for a solution. > > Patrick > > On Tue, Oct 16, 2012 at 11:06 PM, yang.li <[EMAIL PROTECTED]> wrote: >> >> Hi, all: >> >> I'm in charge of a zookeeper cluster including six nodes. It worked well for the last six months, >> but yesterday when I want to list the children of a specific path "/dp/monitor_root/child/CDSkafkaSensor/msg", >> something is just wrong. Here is the dump info as shown below: >> >> [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg >> 2012-10-17 13:47:28,719 [myid:] - WARN [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - Session 0x63a6d4272590001 for server m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket connection and attempting reconnect >> java.io.IOException: Packet len5362775 is out of range! >> at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) >> at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) >> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) >> >> WATCHER:: >> >> WatchedEvent state:Disconnected type:None path:null >> Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /dp/monitor_root/child/CDSkafkaSensor/msg >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) >> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) >> at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) >> at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) >> at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) >> at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) >> at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) >> >> I have tried many ways to solve this problem, but none of those works, even the "super user" method. >> Now neither can I list the node, nor can I delete it . So I really really need a help. Thank you! >> >> >> >> yang.li
-
Re: Some thing is wrongTed Dunning 2012-10-17, 16:34
Using 6 nodes for ZK is a bit odd. Actually, it is a bit even.
If all of the nodes are involved in the quorum, you will get lower write throughput than with 5 nodes and slightly higher chance of failure since it is more likely to get 3/6 node failures versus 3/5 failures. What motivated your choice of 6 nodes? On Tue, Oct 16, 2012 at 11:06 PM, yang.li <[EMAIL PROTECTED]> wrote: > > Hi, all: > > I'm in charge of a zookeeper cluster including six nodes. It worked well > for the last six months, > but yesterday when I want to list the children of a specific path > "/dp/monitor_root/child/CDSkafkaSensor/msg", > something is just wrong. Here is the dump info as shown below: > > [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg > 2012-10-17 13:47:28,719 [myid:] - WARN > [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - > Session 0x63a6d4272590001 for server > m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket > connection and attempting reconnect > java.io.IOException: Packet len5362775 is out of range! > at > org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > WATCHER:: > > WatchedEvent state:Disconnected type:None path:null > Exception in thread "main" > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /dp/monitor_root/child/CDSkafkaSensor/msg > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) > at > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) > at > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) > at > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) > > I have tried many ways to solve this problem, but none of those works, > even the "super user" method. > Now neither can I list the node, nor can I delete it . So I really really > need a help. Thank you! > > > > yang.li
-
Re: Some thing is wrongyang.li 2012-10-18, 01:29
Thank you for your advicse, Ted.
Actually, the sixth node is set to "observer mode", we just put it there to stand by. yang.li From: Ted Dunning Date: 2012-10-18 00:34 To: user; yang.li Subject: Re: Some thing is wrong Using 6 nodes for ZK is a bit odd. Actually, it is a bit even. If all of the nodes are involved in the quorum, you will get lower write throughput than with 5 nodes and slightly higher chance of failure since it is more likely to get 3/6 node failures versus 3/5 failures. What motivated your choice of 6 nodes? On Tue, Oct 16, 2012 at 11:06 PM, yang.li <[EMAIL PROTECTED]> wrote: > > Hi, all: > > I'm in charge of a zookeeper cluster including six nodes. It worked well > for the last six months, > but yesterday when I want to list the children of a specific path > "/dp/monitor_root/child/CDSkafkaSensor/msg", > something is just wrong. Here is the dump info as shown below: > > [zk: zk-6:2181(CONNECTED) 1] ls /dp/monitor_root/child/CDSkafkaSensor/msg > 2012-10-17 13:47:28,719 [myid:] - WARN > [main-SendThread(m32p118.bfdabc.com:2181):ClientCnxn$SendThread@1057] - > Session 0x63a6d4272590001 for server > m32p118.bfdabc.com/192.168.32.118:2181, unexpected error, closing socket > connection and attempting reconnect > java.io.IOException: Packet len5362775 is out of range! > at > org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > WATCHER:: > > WatchedEvent state:Disconnected type:None path:null > Exception in thread "main" > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /dp/monitor_root/child/CDSkafkaSensor/msg > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) > at > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) > at > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) > at > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) > > I have tried many ways to solve this problem, but none of those works, > even the "super user" method. > Now neither can I list the node, nor can I delete it . So I really really > need a help. Thank you! > > > > yang.li
-
Re: Some thing is wrongTed Dunning 2012-10-18, 01:58
Ahh. Better.
On Wed, Oct 17, 2012 at 6:29 PM, yang.li <[EMAIL PROTECTED]> wrote: > Thank you for your advicse, Ted. > Actually, the sixth node is set to "observer mode", > we just put it there to stand by. > |