|
Ian Kallen
2012-12-05, 21:10
Ted Dunning
2012-12-05, 23:37
Ian Kallen
2012-12-06, 00:50
Ted Dunning
2012-12-06, 04:51
Ian Kallen
2012-12-06, 19:25
Ted Dunning
2012-12-07, 01:34
Ian Kallen
2012-12-07, 17:50
Ted Dunning
2012-12-07, 19:24
|
-
determining zookeeper capacity requirementsIan Kallen 2012-12-05, 21:10
We have an ensemble of three servers and have observed varying
latencies, watches that seemingly don't get fired on the client and other operational issues. Here are the current # connections/watches: shell$ for i in 1 2 3; do echo wchs | nc zoo-ensemble$i 2181; done 198 connections watching 174 paths Total watches:1914 41 connections watching 126 paths Total watches:1010 50 connections watching 143 paths Total watches:952 I don't know if we should be concerned with the number of watches is in the thousands (or be concerned that zoo-ensemble1 is handling ~ same number of watches as 2 & 3 combined). Should we be tuning the JVM in any particular way according to the number of watches? From a capacity planning standpoint, what metrics and guidelines should we be observing before we split our tree into separate ensembles or grow the current ensemble? thanks, -Ian
-
Re: determining zookeeper capacity requirementsTed Dunning 2012-12-05, 23:37
THis looks like very low load.
What is the rate of change on znodes (i.e. what is the desired watch signal rate)? On Wed, Dec 5, 2012 at 10:10 PM, Ian Kallen <[EMAIL PROTECTED]> wrote: > We have an ensemble of three servers and have observed varying > latencies, watches that seemingly don't get fired on the client and > other operational issues. Here are the current # connections/watches: > > shell$ for i in 1 2 3; do echo wchs | nc zoo-ensemble$i 2181; done > > 198 connections watching 174 paths > Total watches:1914 > 41 connections watching 126 paths > Total watches:1010 > 50 connections watching 143 paths > Total watches:952 > > I don't know if we should be concerned with the number of watches is > in the thousands (or be concerned that zoo-ensemble1 is handling ~ > same number of watches as 2 & 3 combined). Should we be tuning the JVM > in any particular way according to the number of watches? From a > capacity planning standpoint, what metrics and guidelines should we be > observing before we split our tree into separate ensembles or grow the > current ensemble? > > thanks, > -Ian >
-
Re: determining zookeeper capacity requirementsIan Kallen 2012-12-06, 00:50
Thanks for replying. AFAIK, the change rate isn't high. Though there's
a storm cluster and a few other things whose internals I'm not familiar with, they may be poking their znodes at a high rate that I'm not aware of. The missed watches are on applications that don't have rapid changes in any of their nodes. But we regularly see clients not fire data watches, subsequent changes will fire them so the clients seem to be connected, just missing that first trigger. Also latencies will sometimes suffer pretty wide swings. So it had me wondering how to measure capacity utilization on the ensemble. On Wed, Dec 5, 2012 at 3:37 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > THis looks like very low load. > > What is the rate of change on znodes (i.e. what is the desired watch signal > rate)? > > On Wed, Dec 5, 2012 at 10:10 PM, Ian Kallen <[EMAIL PROTECTED]> wrote: > >> We have an ensemble of three servers and have observed varying >> latencies, watches that seemingly don't get fired on the client and >> other operational issues. Here are the current # connections/watches: >> >> shell$ for i in 1 2 3; do echo wchs | nc zoo-ensemble$i 2181; done >> >> 198 connections watching 174 paths >> Total watches:1914 >> 41 connections watching 126 paths >> Total watches:1010 >> 50 connections watching 143 paths >> Total watches:952 >> >> I don't know if we should be concerned with the number of watches is >> in the thousands (or be concerned that zoo-ensemble1 is handling ~ >> same number of watches as 2 & 3 combined). Should we be tuning the JVM >> in any particular way according to the number of watches? From a >> capacity planning standpoint, what metrics and guidelines should we be >> observing before we split our tree into separate ensembles or grow the >> current ensemble? >> >> thanks, >> -Ian >>
-
Re: determining zookeeper capacity requirementsTed Dunning 2012-12-06, 04:51
This sounds like configuration somewhere.
Have you checked the usual suspects: a) GC on client or ZK cluster? b) bad configuration on ZK which allows split quorum? (really.... surprisingly common) c) bad configuration on client for connect? d) ZK swapping out due to inactivity during memory pressure? On Thu, Dec 6, 2012 at 1:50 AM, Ian Kallen <[EMAIL PROTECTED]> wrote: > Thanks for replying. AFAIK, the change rate isn't high. Though there's > a storm cluster and a few other things whose internals I'm not > familiar with, they may be poking their znodes at a high rate that I'm > not aware of. The missed watches are on applications that don't have > rapid changes in any of their nodes. But we regularly see clients not > fire data watches, subsequent changes will fire them so the clients > seem to be connected, just missing that first trigger. Also latencies > will sometimes suffer pretty wide swings. So it had me wondering how > to measure capacity utilization on the ensemble. > > On Wed, Dec 5, 2012 at 3:37 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > THis looks like very low load. > > > > What is the rate of change on znodes (i.e. what is the desired watch > signal > > rate)? > > > > On Wed, Dec 5, 2012 at 10:10 PM, Ian Kallen <[EMAIL PROTECTED]> > wrote: > > > >> We have an ensemble of three servers and have observed varying > >> latencies, watches that seemingly don't get fired on the client and > >> other operational issues. Here are the current # connections/watches: > >> > >> shell$ for i in 1 2 3; do echo wchs | nc zoo-ensemble$i 2181; done > >> > >> 198 connections watching 174 paths > >> Total watches:1914 > >> 41 connections watching 126 paths > >> Total watches:1010 > >> 50 connections watching 143 paths > >> Total watches:952 > >> > >> I don't know if we should be concerned with the number of watches is > >> in the thousands (or be concerned that zoo-ensemble1 is handling ~ > >> same number of watches as 2 & 3 combined). Should we be tuning the JVM > >> in any particular way according to the number of watches? From a > >> capacity planning standpoint, what metrics and guidelines should we be > >> observing before we split our tree into separate ensembles or grow the > >> current ensemble? > >> > >> thanks, > >> -Ian > >> >
-
Re: determining zookeeper capacity requirementsIan Kallen 2012-12-06, 19:25
On Wed, Dec 5, 2012 at 8:51 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> This sounds like configuration somewhere. > > Have you checked the usual suspects: > > a) GC on client or ZK cluster? We don't have this instrumented yet, I'll raise the priority on it though. > b) bad configuration on ZK which allows split quorum? (really.... > surprisingly common) I think we're good there. > > c) bad configuration on client for connect? I think we're good there, too. > > d) ZK swapping out due to inactivity during memory pressure? Can you cite an explanation or explain this here? I'm not sure what to look for. It wouldn't be clients not detecting that they've lost their session and creating a new one w/o the watches, "retrying" a znode update does trigger the watch on the lapsed clients. thanks! -Ian > > On Thu, Dec 6, 2012 at 1:50 AM, Ian Kallen <[EMAIL PROTECTED]> wrote: > >> Thanks for replying. AFAIK, the change rate isn't high. Though there's >> a storm cluster and a few other things whose internals I'm not >> familiar with, they may be poking their znodes at a high rate that I'm >> not aware of. The missed watches are on applications that don't have >> rapid changes in any of their nodes. But we regularly see clients not >> fire data watches, subsequent changes will fire them so the clients >> seem to be connected, just missing that first trigger. Also latencies >> will sometimes suffer pretty wide swings. So it had me wondering how >> to measure capacity utilization on the ensemble. >> >> On Wed, Dec 5, 2012 at 3:37 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> > THis looks like very low load. >> > >> > What is the rate of change on znodes (i.e. what is the desired watch >> signal >> > rate)? >> > >> > On Wed, Dec 5, 2012 at 10:10 PM, Ian Kallen <[EMAIL PROTECTED]> >> wrote: >> > >> >> We have an ensemble of three servers and have observed varying >> >> latencies, watches that seemingly don't get fired on the client and >> >> other operational issues. Here are the current # connections/watches: >> >> >> >> shell$ for i in 1 2 3; do echo wchs | nc zoo-ensemble$i 2181; done >> >> >> >> 198 connections watching 174 paths >> >> Total watches:1914 >> >> 41 connections watching 126 paths >> >> Total watches:1010 >> >> 50 connections watching 143 paths >> >> Total watches:952 >> >> >> >> I don't know if we should be concerned with the number of watches is >> >> in the thousands (or be concerned that zoo-ensemble1 is handling ~ >> >> same number of watches as 2 & 3 combined). Should we be tuning the JVM >> >> in any particular way according to the number of watches? From a >> >> capacity planning standpoint, what metrics and guidelines should we be >> >> observing before we split our tree into separate ensembles or grow the >> >> current ensemble? >> >> >> >> thanks, >> >> -Ian >> >> >>
-
Re: determining zookeeper capacity requirementsTed Dunning 2012-12-07, 01:34
Is there any swap activity on the client or server boxes?
On Thu, Dec 6, 2012 at 8:25 PM, Ian Kallen <[EMAIL PROTECTED]> wrote: > > d) ZK swapping out due to inactivity during memory pressure? > > Can you cite an explanation or explain this here? I'm not sure what to > look for. It wouldn't be clients not detecting that they've lost their > session and creating a new one w/o the watches, "retrying" a znode > update does trigger the watch on the lapsed clients. >
-
Re: determining zookeeper capacity requirementsIan Kallen 2012-12-07, 17:50
No swap, we keep swap turned off following the "better to OOM than
slow down" theory. I'm gonna try to get GC's plotted for all of these JVMs. I'm also finding instances where watches are fired but an exception thrown inside the watch evaporates the thread without exception handling and logging; I'm wrapping these. But beyond all of that, I'm still on the hunt for guidelines for ZK capacity sizing to get a sense of the performance and stability we should expect with different heap sizes, ensemble sizes, number of clients, unique watches, unique paths and update rates. On Thu, Dec 6, 2012 at 5:34 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Is there any swap activity on the client or server boxes? > > On Thu, Dec 6, 2012 at 8:25 PM, Ian Kallen <[EMAIL PROTECTED]> wrote: > >> > d) ZK swapping out due to inactivity during memory pressure? >> >> Can you cite an explanation or explain this here? I'm not sure what to >> look for. It wouldn't be clients not detecting that they've lost their >> session and creating a new one w/o the watches, "retrying" a znode >> update does trigger the watch on the lapsed clients. >>
-
Re: determining zookeeper capacity requirementsTed Dunning 2012-12-07, 19:24
On Fri, Dec 7, 2012 at 6:50 PM, Ian Kallen <[EMAIL PROTECTED]> wrote:
> No swap, we keep swap turned off following the "better to OOM than > slow down" theory. Good theory except that it may compromise getting cores. > I'm gonna try to get GC's plotted for all of these > JVMs. I'm also finding instances where watches are fired but an > exception thrown inside the watch evaporates the thread without > exception handling and logging; I'm wrapping these. > Excellent. |