Vishal Kathuria 2011-05-27, 22:32
Fournier, Camille F. [Tec... 2011-05-27, 23:23
Benjamin Reed 2011-05-30, 03:41
Vishal Kathuria 2011-05-30, 18:15
Dhruba Borthakur 2011-07-01, 21:49
-RE: Discussion on supporting a large number of clients for a zk ensemble
Thanks for the suggestion Dhruba.
I will open a Jira and continue the discussion there. I also got a chance to discuss some of the ideas at the zookeeper community meet yesterday.
I have prototyped some of my ideas and I should soon be able to share the performance sceanarios and measurements too.
From: Dhruba Borthakur [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 01, 2011 2:50 PM
To: [EMAIL PROTECTED]
Subject: Re: Discussion on supporting a large number of clients for a zk ensemble
Hi Ben/Camille: can you comment on Vishal's logs/config? The "local session"
idea seems promising to me.
Vishal: it would be nice if you create a JIRA with your proposal and we can continue discussion in the JIRA?
thanks a bunch,
On Mon, May 30, 2011 at 11:15 AM, Vishal Kathuria <[EMAIL PROTECTED]>wrote:
> Thanks for looking at this Camille and Benjamin,
> There are 5 machines, 2 hosting clients and 3 hosting servers.
> There is one client process on each of the client machines The client
> process has 20 threads, each thread with 500 sessions.
> So I have a total of 20K clients, so it isn't that high really
> Two proc Intel(r) Xeon(r) Processor L5420 (total 8 cores) 8G RAM
> The workload is fairly simple:
> All sessions do is keep a watch on a node. Once the watch fires, the client
> reads the contents of the node and puts the watch again.
> There is one thread that is periodically updating the node being watched
> (once every 30s - so very infrequent)
> When the system starts off, things are fine, then a few timers starts
> missing and eventually there are lots of expired connections.
> The logs are really long, but pretty much repetitive, so I am attaching the
> tail of the logs.
> The client timeout is 300s
> JVM Parameters
> -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:MaxGCPauseMillis=50
> -Dzookeeper.globalOutstandingLimit=30000 -Xms6000m -Xmx6000m -Xdebug
> I have GC logging turned on. I am not seeing long GC pauses, so I don't
> think that's it.
> Next steps I am trying
> 1. Look at the CPU utilization on the server machines
> 2. If the CPU is pegged at 100%, add some additional tracing in the server
> to validate my hypothesis that the session tracker is getting overwhelmed
> If you folks have any other suggestions, that would greatly help. I started
> working with zookeeper a couple of weeks ago so it is very likely I might be
> missing something obvious.
> -----Original Message-----
> From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, May 29, 2011 8:42 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Discussion on supporting a large number of clients for a zk
> i second camille's suggestion. i also know there are other people looking
> into using zookeeper with a large number of clients, so it would be good to
> figure out what are the limits and then how to cross them. i like your
> proposed solutions, but i would rather start down that road after we have
> resolved the issues that we can for the normal clients.
> On Fri, May 27, 2011 at 4:23 PM, Fournier, Camille F. [Tech] <
> [EMAIL PROTECTED]> wrote:
> > I would recommend that you spend some time making sure that your guess
> about the cause is correct before trying to design solutions to the problem.
> Can you provide us some hard numbers, logs, and configuration information?
> It's always possible that some aspect of your configuration that you hadn't
> considered important is in fact the trigger here.
> > Thanks,
> > Camille
> > -----Original Message-----
> > From: Vishal Kathuria [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, May 27, 2011 6:32 PM
> > To: [EMAIL PROTECTED]
> > Subject: Discussion on supporting a large number of clients for a zk
> > ensemble
> > Hi Folks,
> > I wanted to start a discussion on how we can support a large number of
Connect to me at http://www.facebook.com/dhruba