Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> excessive client timeouts tied to NIO select(timeout)

Copy link to this message
Re: excessive client timeouts tied to NIO select(timeout)
Thanks .

We're running on Amazon EC2 so the network is unpredictable.  Our app is doing a ton of other I/O via Amazon S3 as well as to Cassandra.

Even given that, asking for 6 seconds and getting 14-24 seems out of bounds.

Problem only occurs after app has been running for an hour, with heavy load.

GC is a good idea but nothing bad there.  

I could raise the session timeout and hardcode the timeout parameter to 3 or 4 seconds...a hack but it might get me past this issue.



Sent from my iPhone

On Feb 26, 2013, at 1:24 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:

> Is it possible that something else is going on in the application that is
> using this client, or are you observing this happen in a simple test
> client? I don't think a 15-25s wait time is within any reasonable bounds of
> "more or less". This timeout is passed down to native code in the JVM so a
> bug there causing a "more" of that magnitude would probably affect a lot of
> people. I would start looking into what kind of networking conditions could
> be causing such a hang, assuming you don't have full GC happening that is
> pausing the process during this period (which could cause such a long hang).
> On Tue, Feb 26, 2013 at 12:15 PM, Brian Tarbox <[EMAIL PROTECTED]>wrote:
>> The main client loop involves sending keep-alive pings in-between calls to
>> the NIO selector.select call which looks for data from the server
>> (including ping responses).
>> What I've found is that the select() which takes a timeout value takes a
>> hugely varying time to complete.
>> When asking for a max 6 second timeout on the select call I'm in fact
>> staying in the call for 15-25 seconds.  Which leads to starving the keep
>> alives which leads to timeouts.
>> Looking at the NIO documentation of the timeout parameter to select it
>> says:
>> timeout - If positive, block for up to timeout milliseconds, *more or less*
>> *
>> *
>> Has anyone else seen this or have a suggestion for a work around?  This
>> seems like a basic flaw.  If I can't count on timely return from select it
>> seems to break the how keep-alive scheme.
>> Thanks in advance for any help!
>> Brian Tarbox
>> --
>> http://about.me/BrianTarbox