Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - adding a separate thread to detect network timeouts faster


Copy link to this message
-
RE: adding a separate thread to detect network timeouts faster
Rakesh R 2013-09-13, 06:24
>>>>>> I think this can be done purely on the client side. Create a separate thread that sends a 4 letter word command like ruok periodically, and close the socket if the client doesn't get the response within certain amount of time.
Thanks Michi for pointing to '4 letter word command'.

I would like to add one point where we have more number of clients(which was mentioned in below mail threads), say 50,000 clients and the heartbeat interval is 2secs. With this ruok approach, this would have the overhead of establishing socket connections, if each client is sending ruok command to respective server. Instead of sending heartbeat from each zkclient session, the clientcnxn side logic can do sending heartbeat from each host and update the status to all the clients that has created from that host. Any thoughts?

-Rakesh

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Michi Mutsuzaki
Sent: 12 September 2013 23:35
To: Rakesh R
Cc: [EMAIL PROTECTED]; German Blanco
Subject: Re: adding a separate thread to detect network timeouts faster

On Thu, Sep 12, 2013 at 12:05 AM, Rakesh R <[EMAIL PROTECTED]> wrote:
> AFAIK, ping requests would not involve any disk I/O, but it would go through the RequestProcessor chain and executes sequentially.

Yes, that's what I meant. Ping requests don't touch disk, but they do go through the commit processor. So if a ping request is behind a write operation that takes a long time, the ping request will be affected. This is done intentionally to take the disk into account for the heartbeat mechanism.

Anyways, I misunderstood what Jeremy was proposing. He wants to keep the session timeout relatively high to tolerate slow disk, but at the same time detect non-disk failure (node down, network partition) more quickly.

I think this can be done purely on the client side. Create a separate thread that sends a 4 letter word command like ruok periodically, and close the socket if the client doesn't get the response within certain amount of time.