Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> excessive client timeouts tied to NIO select(timeout)


Copy link to this message
-
Re: excessive client timeouts tied to NIO select(timeout)
Thanks .

We're running on Amazon EC2 so the network is unpredictable.  Our app is doing a ton of other I/O via Amazon S3 as well as to Cassandra.

Even given that, asking for 6 seconds and getting 14-24 seems out of bounds.

Problem only occurs after app has been running for an hour, with heavy load.

GC is a good idea but nothing bad there.  

I could raise the session timeout and hardcode the timeout parameter to 3 or 4 seconds...a hack but it might get me past this issue.

Brian

 

Sent from my iPhone

On Feb 26, 2013, at 1:24 PM, Camille Fournier <[EMAIL PROTECTED]> wrote:

> Is it possible that something else is going on in the application that is
> using this client, or are you observing this happen in a simple test
> client? I don't think a 15-25s wait time is within any reasonable bounds of
> "more or less". This timeout is passed down to native code in the JVM so a
> bug there causing a "more" of that magnitude would probably affect a lot of
> people. I would start looking into what kind of networking conditions could
> be causing such a hang, assuming you don't have full GC happening that is
> pausing the process during this period (which could cause such a long hang).
>
>
> On Tue, Feb 26, 2013 at 12:15 PM, Brian Tarbox <[EMAIL PROTECTED]>wrote:
>
>> The main client loop involves sending keep-alive pings in-between calls to
>> the NIO selector.select call which looks for data from the server
>> (including ping responses).
>>
>> What I've found is that the select() which takes a timeout value takes a
>> hugely varying time to complete.
>>
>> When asking for a max 6 second timeout on the select call I'm in fact
>> staying in the call for 15-25 seconds.  Which leads to starving the keep
>> alives which leads to timeouts.
>>
>> Looking at the NIO documentation of the timeout parameter to select it
>> says:
>> timeout - If positive, block for up to timeout milliseconds, *more or less*
>> *
>> *
>> Has anyone else seen this or have a suggestion for a work around?  This
>> seems like a basic flaw.  If I can't count on timely return from select it
>> seems to break the how keep-alive scheme.
>>
>> Thanks in advance for any help!
>>
>> Brian Tarbox
>>
>> --
>> http://about.me/BrianTarbox
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB