Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> zookeeper dataDir corrupted and now hbase can not connect to zookeeper


Copy link to this message
-
zookeeper dataDir corrupted and now hbase can not connect to zookeeper
Hi,
Something abnormal happened to my hadoop cluster. Actually the default
location of snapshot & dataDir for zookeeper is /var/lib/zookeeper in cdh4.
The disk at which /var location is configured became full and the cluster
went down (zookeeper & HBase was in ERROR status). I have cleaned /var
location but it seems the snapshot & dataDir location of zookeeper is not
getting updated & HBase master is not able to connect to zookeeper.

We restarted zookeeper and HBase a couple of time.  We also stopped
zookeeper node one by one to isolate the corrupted node. But seems like all
3 zookeeper nodes got corrupted. It is strange as only one server's disk got
filled.

Here is the exception, we got ->

2013-01-18 15:17:20,840 WARN org.apache.zookeeper.server.NIOServerCnxn:
Exception causing close of session 0x0 due to java.io.IOException:
ZooKeeperServer not running
2013-01-18 15:17:20,840 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /168.72.70.92:39880 (no session
established for client)
2013-01-18 15:17:20,922 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3
(n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3
(n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,123 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3
(n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3
(n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,328 INFO org.apache.zookeeper.server.quorum.QuorumPeer:
FOLLOWING
2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.ZooKeeperServer:
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout
60000 datadir /var/lib/zookeeper/version-2 snapdir
/var/lib/zookeeper/version-2
2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.quorum.Learner:
FOLLOWING - LEADER ELECTION TOOK - 829
2013-01-18 15:17:21,330 WARN org.apache.zookeeper.server.quorum.Learner:
Exception when following the leader
java.io.EOFException
       at java.io.DataInputStream.readInt(DataInputStream.java:375)
       at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
       at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
       at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
       at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
       at
org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:272)
       at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-01-18 15:17:21,330 INFO org.apache.zookeeper.server.quorum.Learner:
shutdown called
java.lang.Exception: shutdown Follower
       at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-01-18 15:17:21,331 INFO
org.apache.zookeeper.server.quorum.FollowerZooKeeperServer: Shutting down
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.ZooKeeperServer:
shutting down
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.quorum.QuorumPeer:
LOOKING
2013-01-18 15:17:21,331 INFO
org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot
/var/lib/zookeeper/version-2/snapshot.700000092
2013-01-18 15:17:21,348 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: New election. My id =
1, proposed zxid=0x7000000f4
2013-01-18 15:17:21,349 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1
(n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 1
(n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,349 WARN
org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to
4 at election addressdgmstsw001.nam.nsroot.net/168.72.70.89:4181
java.net.ConnectException: Connection refused
       at java.net.PlainSocketImpl.socketConnect(Native Method)
       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
       at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
       at java.net.Socket.connect(Socket.java:529)
       at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
       at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
       at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)

Appreciate any help in this matter,

Thanks,
Saurabh.
View this message in context: http://zookeeper-user.578899.n2.nabble.com/zookeeper-dataDir-corrupted-and-now-hbase-can-not-connect-to-zookeeper-tp7578425.html
Sent from the zookeeper-user mailing list archive at Nabble.com.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB