On Wed, Dec 5, 2012 at 8:51 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> This sounds like configuration somewhere.
> Have you checked the usual suspects:
> a) GC on client or ZK cluster?
We don't have this instrumented yet, I'll raise the priority on it though.
> b) bad configuration on ZK which allows split quorum? (really....
> surprisingly common)
I think we're good there.
> c) bad configuration on client for connect?
I think we're good there, too.
> d) ZK swapping out due to inactivity during memory pressure?
Can you cite an explanation or explain this here? I'm not sure what to
look for. It wouldn't be clients not detecting that they've lost their
session and creating a new one w/o the watches, "retrying" a znode
update does trigger the watch on the lapsed clients.
> On Thu, Dec 6, 2012 at 1:50 AM, Ian Kallen <[EMAIL PROTECTED]> wrote:
>> Thanks for replying. AFAIK, the change rate isn't high. Though there's
>> a storm cluster and a few other things whose internals I'm not
>> familiar with, they may be poking their znodes at a high rate that I'm
>> not aware of. The missed watches are on applications that don't have
>> rapid changes in any of their nodes. But we regularly see clients not
>> fire data watches, subsequent changes will fire them so the clients
>> seem to be connected, just missing that first trigger. Also latencies
>> will sometimes suffer pretty wide swings. So it had me wondering how
>> to measure capacity utilization on the ensemble.
>> On Wed, Dec 5, 2012 at 3:37 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>> > THis looks like very low load.
>> > What is the rate of change on znodes (i.e. what is the desired watch
>> > rate)?
>> > On Wed, Dec 5, 2012 at 10:10 PM, Ian Kallen <[EMAIL PROTECTED]>
>> >> We have an ensemble of three servers and have observed varying
>> >> latencies, watches that seemingly don't get fired on the client and
>> >> other operational issues. Here are the current # connections/watches:
>> >> shell$ for i in 1 2 3; do echo wchs | nc zoo-ensemble$i 2181; done
>> >> 198 connections watching 174 paths
>> >> Total watches:1914
>> >> 41 connections watching 126 paths
>> >> Total watches:1010
>> >> 50 connections watching 143 paths
>> >> Total watches:952
>> >> I don't know if we should be concerned with the number of watches is
>> >> in the thousands (or be concerned that zoo-ensemble1 is handling ~
>> >> same number of watches as 2 & 3 combined). Should we be tuning the JVM
>> >> in any particular way according to the number of watches? From a
>> >> capacity planning standpoint, what metrics and guidelines should we be
>> >> observing before we split our tree into separate ensembles or grow the
>> >> current ensemble?
>> >> thanks,
>> >> -Ian