Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Query regarding zookeeper


+
Subramanyeswara Bhavirise... 2013-03-30, 23:12
Copy link to this message
-
Re: Query regarding zookeeper
Kathleen Ting 2013-04-04, 05:39
Hi Subramanyeswara,

It can take quite a bit of time for the leader to communicate the
znode data to the followers during startup. initLimit is the amount of
time the leader will allow the followers to download the data from the
leader and get to a point where they are ready to serve requests to
clients. If this time is exceeded, the leader will close the
connection to the follower and the quorum process will restart. Your
current initLimit value of 20 sec is typically more than enough time.
That said, if you have a slow connection or the data is very large it
might be necessary to increase this.

The syncLimit controls the amount of time the quorum members will
allow each other to communicate. If the leader doesn't hear from a
follower in the syncLimit time, it will drop the follower from the
quorum, and vice versa. It's important not to set this parameter too
high as it's one of the ways a server detects networking issues and
causes recovery to take place (i.e. the follower will drop out of the
quorum and try to reconnect, all clients on that follower will
disconnect and reconnect to another server, etc...).

Can you try increasing both initLimit and syncLimit to 100 on all ZK
servers and restarting? Note that it's not good to have syncLimit be
such a high value - fixed by ZOOKEEPER-1521.

Regards,
Kate

On Sat, Mar 30, 2013 at 4:12 PM, Subramanyeswara Bhavirisetty
<[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have 5 box zookeeper quorum. 2 / 5 are unable to deserialize the message from leader and hence they are not coming up. I tried restarting few times but still seeing the same issue. This issue is observed when we were trying to do the following:
>
>
> 1>   We have a script which brings down the zookeeper server on the leader. We are trying to test if new follower takes over once leader is down.
>
> 2>   The script has been running since 2-3 days. Surprisingly we don't see any issue for the first 2 days but after that we hit the below issue wherein 2 out 5 zookeeper machines are unable to deserialize the message.
>
> zookeeper server = 3.4.5
>
> Following are the zookeeper settings:
>
> tickTime=4000
> initLimit=20
> syncLimit=15
> snapCount=100000
>
> Logs from Follower
>
> ---------------
>
> Thu Mar 28 19:14:14 2013: 2013-03-28 19:14:14,158 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 0 (n.zxid), 3
> (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,811 - WARN
> [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader
> Thu Mar 28 19:14:41 2013: java.io.EOFException
> Thu Mar 28 19:14:41 2013:       at
> java.io.DataInputStream.readInt(DataInputStream.java:392)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,811 - INFO
> [QuorumPeer:/0.0.0.0:2181:Follower@165] - shutdown called
> Thu Mar 28 19:14:41 2013: java.lang.Exception: shutdown Follower
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,812 - INFO
> [QuorumPeer:/0.0.0.0:2181:QuorumPeer@621] - LOOKING