Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> zookeeper dataDir corrupted and now hbase can not connect to zookeeper


Copy link to this message
-
zookeeper dataDir corrupted and now hbase can not connect to zookeeper
Hi,
Something abnormal happened to my hadoop cluster. Actually the default
location of snapshot & dataDir for zookeeper is /var/lib/zookeeper in cdh4.
The disk at which /var location is configured became full and the cluster
went down (zookeeper & HBase was in ERROR status). I have cleaned /var
location but it seems the snapshot & dataDir location of zookeeper is not
getting updated & HBase master is not able to connect to zookeeper.

We restarted zookeeper and HBase a couple of time.  We also stopped
zookeeper node one by one to isolate the corrupted node. But seems like all
3 zookeeper nodes got corrupted. It is strange as only one server's disk got
filled.

Here is the exception, we got ->

2013-01-18 15:17:20,840 WARN org.apache.zookeeper.server.NIOServerCnxn:
Exception causing close of session 0x0 due to java.io.IOException:
ZooKeeperServer not running
2013-01-18 15:17:20,840 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /168.72.70.92:39880 (no session
established for client)
2013-01-18 15:17:20,922 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3
(n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3
(n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,123 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3
(n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3
(n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,328 INFO org.apache.zookeeper.server.quorum.QuorumPeer:
FOLLOWING
2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.ZooKeeperServer:
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout
60000 datadir /var/lib/zookeeper/version-2 snapdir
/var/lib/zookeeper/version-2
2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.quorum.Learner:
FOLLOWING - LEADER ELECTION TOOK - 829
2013-01-18 15:17:21,330 WARN org.apache.zookeeper.server.quorum.Learner:
Exception when following the leader
java.io.EOFException
       at java.io.DataInputStream.readInt(DataInputStream.java:375)
       at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
       at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
       at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
       at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
       at
org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:272)
       at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-01-18 15:17:21,330 INFO org.apache.zookeeper.server.quorum.Learner:
shutdown called
java.lang.Exception: shutdown Follower
       at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-01-18 15:17:21,331 INFO
org.apache.zookeeper.server.quorum.FollowerZooKeeperServer: Shutting down
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.ZooKeeperServer:
shutting down
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.quorum.QuorumPeer:
LOOKING
2013-01-18 15:17:21,331 INFO
org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot
/var/lib/zookeeper/version-2/snapshot.700000092
2013-01-18 15:17:21,348 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: New election. My id =
1, proposed zxid=0x7000000f4
2013-01-18 15:17:21,349 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1
(n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 1
(n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,349 WARN
org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to
4 at election addressdgmstsw001.nam.nsroot.net/168.72.70.89:4181
java.net.ConnectException: Connection refused
       at java.net.PlainSocketImpl.socketConnect(Native Method)
       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
       at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
       at java.net.Socket.connect(Socket.java:529)
       at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
       at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
       at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)

Appreciate any help in this matter,

Thanks,
Saurabh.
View this message in context: http://zookeeper-user.578899.n2.nabble.com/zookeeper-dataDir-corrupted-and-now-hbase-can-not-connect-to-zookeeper-tp7578425.html
Sent from the zookeeper-user mailing list archive at Nabble.com.