Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Discussion on supporting a large number of clients for a zk ensemble


+
Vishal Kathuria 2011-05-27, 22:32
+
Fournier, Camille F. [Tec... 2011-05-27, 23:23
+
Benjamin Reed 2011-05-30, 03:41
+
Vishal Kathuria 2011-05-30, 18:15
+
Dhruba Borthakur 2011-07-01, 21:49
Copy link to this message
-
RE: Discussion on supporting a large number of clients for a zk ensemble
Thanks for the suggestion Dhruba.
I will open a Jira and continue the discussion there. I also got a chance to discuss some of the ideas at the zookeeper community meet yesterday.

I have prototyped some of my ideas and I should soon be able to share the performance sceanarios and measurements too.

Thanks!
Vishal

-----Original Message-----
From: Dhruba Borthakur [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 01, 2011 2:50 PM
To: [EMAIL PROTECTED]
Subject: Re: Discussion on supporting a large number of clients for a zk ensemble

Hi Ben/Camille: can you comment on Vishal's logs/config? The "local session"
idea seems promising to me.

Vishal: it would be nice if you create a JIRA with your proposal and we can continue discussion in the JIRA?

thanks a bunch,
dhruba

On Mon, May 30, 2011 at 11:15 AM, Vishal Kathuria <[EMAIL PROTECTED]>wrote:

> Thanks for looking at this Camille and Benjamin,
>
> setup:
> There are 5 machines, 2 hosting clients and 3 hosting servers.
> There is one client process on each of the client machines The client
> process has 20 threads, each thread with 500 sessions.
> So I have a total of 20K clients, so it isn't that high really
>
> Hardware
> Two proc Intel(r) Xeon(r) Processor L5420  (total 8 cores) 8G RAM
>
>
> The workload is fairly simple:
> All sessions do is keep a watch on a node. Once the watch fires, the client
> reads the contents of the node and puts the watch again.
> There is one thread that is periodically updating the node being watched
> (once every 30s - so very infrequent)
>
> When the system starts off, things are fine, then a few timers starts
> missing and eventually there are lots of expired connections.
>
> The logs are really long, but pretty much repetitive, so I am attaching the
> tail of the logs.
> The client timeout is 300s
>
> JVM Parameters
> -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:MaxGCPauseMillis=50
> -Dzookeeper.globalOutstandingLimit=30000 -Xms6000m -Xmx6000m -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8180
> I have GC logging turned on. I am not seeing long GC pauses, so I don't
> think that's it.
>
> Next steps I am trying
> 1. Look at the CPU utilization on the server machines
> 2. If the CPU is pegged at 100%, add some additional tracing in the server
> to validate my hypothesis that the session tracker is getting overwhelmed
>
> If you folks have any other suggestions, that would greatly help. I started
> working with zookeeper a couple of weeks ago so it is very likely I might be
> missing something obvious.
>
>
> Thanks!
> Vishal
>
> -----Original Message-----
> From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, May 29, 2011 8:42 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Discussion on supporting a large number of clients for a zk
> ensemble
>
> i second camille's suggestion. i also know there are other people looking
> into using zookeeper with a large number of clients, so it would be good to
> figure out what are the limits and then how to cross them. i like your
> proposed solutions, but i would rather start down that road after we have
> resolved the issues that we can for the normal clients.
>
> ben
>
> On Fri, May 27, 2011 at 4:23 PM, Fournier, Camille F. [Tech] <
> [EMAIL PROTECTED]> wrote:
> > I would recommend that you spend some time making sure that your guess
> about the cause is correct before trying to design solutions to the problem.
> Can you provide us some hard numbers, logs, and configuration information?
> It's always possible that some aspect of your configuration that you hadn't
> considered important is in fact the trigger here.
> >
> > Thanks,
> > Camille
> >
> > -----Original Message-----
> > From: Vishal Kathuria [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, May 27, 2011 6:32 PM
> > To: [EMAIL PROTECTED]
> > Subject: Discussion on supporting a large number of clients for a zk
> > ensemble
> >
> > Hi Folks,
> > I wanted to start a discussion on how we can support a large number of

Connect to me at http://www.facebook.com/dhruba