Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Distribution Problems With Multiple Zookeeper Clients


Copy link to this message
-
Re: Distribution Problems With Multiple Zookeeper Clients
Again, if you want your clients to perform equal work, you need to balance
when they will take jobs with how many jobs they are currently processing.
If instance1 is doing 100 jobs and it shouldn't be, then there must be a
case when instance1 is running one job and getting the lock to run another,
but instance20 (say) is not running anything. If you want to balance
better, you need to change the way you race to grab the lock to do a job.

I never suggested you were locking a job that wasn't ready to process. But
your clients are locking a job when they are already busy, and this means
that the early clients are doing more work than you want them to. Here's a
pseudoalgoritm that would fix this:
client
when(watch notification)
if(my # jobs in flight == 0)
try to grab lock immediately
else
wait(# jobs in flight * 100ms * random)
try to grab lock

Now, if client 1 gets a watch notification but it already has a job in
flight, it's going to sleep a bit before it tries to grab the lock. This
will give the later clients a chance to get the lock first.

A better way to do this is to have a bounded queue of threads to process
locking and work, but I can't write you a pseudoalgorithm for that and I
suspect it would be a bit beyond what you really need.

C

On Fri, May 25, 2012 at 1:30 PM, Narasimha Tadepalli <
[EMAIL PROTECTED]> wrote:

> Actually we are locking the jobs before accepting new jobs. None of the
> workers won't lock the job if it is not ready to process yet. Let me ask
> you this in relation to your second response where you expressed some good
> assumptions.
>
> Below stats give you some rough estimate on what exactly going on.
>
> Zookeeper Client                Total Number of Jobs processed in two hour
> time
>
> Client Instance1 ------------------>         100
> Client Instance2 ------------------>         90
> Client Instance3 ------------------>         80
> Client Instance4 ------------------>         70
> Client Instance5 ------------------>         60
> Client Instance6 ------------------>         50
> Client Instance7 ------------------>         40
> Client Instance8 ------------------>         30
>
>
> All these instances started 24 hours back in different time slots, but
> data which I presented here for last two hours. Your assumption was Client
> Instance1 registered with server first and that's why it succeeding the
> race in receiving the event notification first always. Which is right also
> after verifying the facts. But my problem here how do I force each of this
> clients to perform equally or approximately equal. Ie. All worker instances
> should able to process 65 jobs in two hours ( all 8 workers processed 520
> which is divided by 8 = 65). As I mentioned it doesn't have to be exact 65
> but not 30 or 100. I hope you can understand my situation clearly now. BTW
> in reality we launch workers between 50 to 100 in a day.
>
> Thanks
> Narasimha
>
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Camille Fournier
> Sent: Friday, May 25, 2012 11:48 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> If your code is doing the following:
> client gets watch notification
> client immediately tries to grab lock
> client then puts job in queue to process
>
> That's not going to work.
>
> You need to do
> client gets watch notification
> client puts lock grab in queue with work that is being processed when
> queue has bandwidth, try to grab lock and process job
>
> The grabbing of the lock to do work and the queue of threads available to
> do work need to be coupled, otherwise you are grabbing work you don't have
> capacity to do.
>
> You can also hack this by
> client gets watch notification
> client does a random sleep or a sleep based on amount of work currently on
> this machine, then tries to grab lock
>
> C
>
> On Fri, May 25, 2012 at 12:41 PM, Narasimha Tadepalli <
> [EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB