Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Distribution Problems With Multiple Zookeeper Clients


Copy link to this message
-
Re: Distribution Problems With Multiple Zookeeper Clients
Camille Fournier 2012-05-25, 16:48
If your code is doing the following:
client gets watch notification
client immediately tries to grab lock
client then puts job in queue to process

That's not going to work.

You need to do
client gets watch notification
client puts lock grab in queue with work that is being processed
when queue has bandwidth, try to grab lock and process job

The grabbing of the lock to do work and the queue of threads available to
do work need to be coupled, otherwise you are grabbing work you don't have
capacity to do.

You can also hack this by
client gets watch notification
client does a random sleep or a sleep based on amount of work currently on
this machine, then tries to grab lock

C

On Fri, May 25, 2012 at 12:41 PM, Narasimha Tadepalli <
[EMAIL PROTECTED]> wrote:

> No actually server keep accumulating lot of jobs in queue which are not
> picking up by any of these idle worker instances. Those jobs are waiting
> until the other workers finished their currently processing jobs. Where do
> you exactly suggesting me to put sleeps to prevent watchers receiving
> events further? As long as zookeeper session is active I didn't find any
> way to control these watchers to stop receiving events. Please advise me if
> there is way to control it.
>
> Thanks
> Narasimha
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Camille Fournier
> Sent: Thursday, May 24, 2012 5:22 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> You can put random sleeps in after you get a notification before you try
> to grab the lock, or sleeps based on the active job count, to favor workers
> with no or few jobs in flight. It seems to me that if you have limited the
> jobs able to be processed on a worker by limiting your thread pool
> appropriately, and if you still aren't hitting all 30 servers, maybe you
> don't need 30 servers to be doing these jobs? Is that possible?
>
> C
>
>
> On Thu, May 24, 2012 at 3:55 PM, Narasimha Tadepalli <
> [EMAIL PROTECTED]> wrote:
>
> > Hi Camille
> >
> > I tried to control the job load at zookeeper clients by minimizing the
> > number of jobs to process, but no luck in forcing the other idle
> > workers to pick up the events. I am wondering if there is any way I
> > can force the watcher to stop receiving events or force the zookeeper
> > to connection time out without calling .close() method to let it retry
> > to connect to the server which makes it rest of the client instances
> > move to top priority while receiving the events. Appreciate your help
> again.
> >
> > Thanks
> > Narasimha
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> > Camille Fournier
> > Sent: Thursday, May 17, 2012 1:22 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> >
> > The below is written assuming that all clients are seeing all events,
> > but then they race to get a lock of some sort to do the work, and the
> > same 10 are always getting the lock to do the work. If in fact not all
> > of your clients are even getting all the events, that's another problem.
> >
> >
> > So here's what I think happens, although other devs that know this
> > code better may prove me wrong. When a client connects to a server and
> > creates a watch for a particular path, that member of the ZK quorum
> > adds the watch for that path to a WatchManager. The WatchManager
> > underlying has a HashSet that contains the watches for that path. When
> > an event happens on that path, the server will iteratate through the
> > watchers on that path and send them the watch notification.
> > It's quite possible that if your events are infrequent and/or your
> > client servers aren't that loaded, what will happen is that the first
> > few clients that registered that watch on each quorum member are
> > likely to receive and process the watch first because their