Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Discussion on supporting a large number of clients for a zk ensemble


+
Vishal Kathuria 2011-05-27, 22:32
+
Fournier, Camille F. [Tec... 2011-05-27, 23:23
+
Benjamin Reed 2011-05-30, 03:41
+
Vishal Kathuria 2011-05-30, 18:15
+
Dhruba Borthakur 2011-07-01, 21:49
Copy link to this message
-
RE: Discussion on supporting a large number of clients for a zk ensemble
Thanks for the suggestion Dhruba.
I will open a Jira and continue the discussion there. I also got a chance to discuss some of the ideas at the zookeeper community meet yesterday.

I have prototyped some of my ideas and I should soon be able to share the performance sceanarios and measurements too.

Thanks!
Vishal

-----Original Message-----
From: Dhruba Borthakur [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 01, 2011 2:50 PM
To: [EMAIL PROTECTED]
Subject: Re: Discussion on supporting a large number of clients for a zk ensemble

Hi Ben/Camille: can you comment on Vishal's logs/config? The "local session"
idea seems promising to me.

Vishal: it would be nice if you create a JIRA with your proposal and we can continue discussion in the JIRA?

thanks a bunch,
dhruba

On Mon, May 30, 2011 at 11:15 AM, Vishal Kathuria <[EMAIL PROTECTED]>wrote:

> Thanks for looking at this Camille and Benjamin,
>
> setup:
> There are 5 machines, 2 hosting clients and 3 hosting servers.
> There is one client process on each of the client machines The client
> process has 20 threads, each thread with 500 sessions.
> So I have a total of 20K clients, so it isn't that high really
>
> Hardware
> Two proc Intel(r) Xeon(r) Processor L5420  (total 8 cores) 8G RAM
>
>
> The workload is fairly simple:
> All sessions do is keep a watch on a node. Once the watch fires, the client
> reads the contents of the node and puts the watch again.
> There is one thread that is periodically updating the node being watched
> (once every 30s - so very infrequent)
>
> When the system starts off, things are fine, then a few timers starts
> missing and eventually there are lots of expired connections.
>
> The logs are really long, but pretty much repetitive, so I am attaching the
> tail of the logs.
> The client timeout is 300s
>
> JVM Parameters
> -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:MaxGCPauseMillis=50
> -Dzookeeper.globalOutstandingLimit=30000 -Xms6000m -Xmx6000m -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8180
> I have GC logging turned on. I am not seeing long GC pauses, so I don't
> think that's it.
>
> Next steps I am trying
> 1. Look at the CPU utilization on the server machines
> 2. If the CPU is pegged at 100%, add some additional tracing in the server
> to validate my hypothesis that the session tracker is getting overwhelmed
>
> If you folks have any other suggestions, that would greatly help. I started
> working with zookeeper a couple of weeks ago so it is very likely I might be
> missing something obvious.
>
>
> Thanks!
> Vishal
>
> -----Original Message-----
> From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, May 29, 2011 8:42 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Discussion on supporting a large number of clients for a zk
> ensemble
>
> i second camille's suggestion. i also know there are other people looking
> into using zookeeper with a large number of clients, so it would be good to
> figure out what are the limits and then how to cross them. i like your
> proposed solutions, but i would rather start down that road after we have
> resolved the issues that we can for the normal clients.
>
> ben
>
> On Fri, May 27, 2011 at 4:23 PM, Fournier, Camille F. [Tech] <
> [EMAIL PROTECTED]> wrote:
> > I would recommend that you spend some time making sure that your guess
> about the cause is correct before trying to design solutions to the problem.
> Can you provide us some hard numbers, logs, and configuration information?
> It's always possible that some aspect of your configuration that you hadn't
> considered important is in fact the trigger here.
> >
> > Thanks,
> > Camille
> >
> > -----Original Message-----
> > From: Vishal Kathuria [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, May 27, 2011 6:32 PM
> > To: [EMAIL PROTECTED]
> > Subject: Discussion on supporting a large number of clients for a zk
> > ensemble
> >
> > Hi Folks,
> > I wanted to start a discussion on how we can support a large number of

Connect to me at http://www.facebook.com/dhruba
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB