Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Zookeeper C client I/O thread delay in poll system call


Copy link to this message
-
Re: Zookeeper C client I/O thread delay in poll system call
Hi Tabrez -

Just wondering - are the number of file descriptors that are being polled
going up rapidly? Are you able to capture that data?

Henry

On 8 December 2010 23:42, Tabrez Memon <[EMAIL PROTECTED]> wrote:

> Hi
>
> I am observing the following error in my app based on the C zookeeper
> client.
> The app is completely idle at all times and it has just created a
> connection
> with the zookeeper server running on the same machine with a session
> timeout of
> 5000ms.
>
> 2010-12-04 10:14:45,593:8710(0x7f641136d910):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 25317ms
> 2010-12-04
> 10:14:45,593:8710(0x7f641136d910):ZOO_ERROR@handle_socket_error_msg@1528:
> Socket
> [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
> connection
> timed out (exceeded timeout by 23651ms)
>
> On adding some debug logs, I found that the delay happens in the poll()
> system
> call used in the zookeeper I/O thread (mt_zookeeper.c). I am using a
> session
> timeout of 5 seconds. Thus poll gets called with a timeout of ~1666 ms.
> However
> after running the app for a while, poll gets stuck for a duration much
> higher
> than the specified timeout. Here are the logs which illustrate this
> problem:
>
> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):ZOO_DEBUG@do_io@303: called
> zk
> interest
> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):ZOO_DEBUG@do_io@311: calling
> poll
> with timeout 1666ms
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@313: out of
> poll
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@325: calling
> zookeeper_process
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@327: out of
> zookeeper_process
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@301: calling
> zk
> interest
>
> As you can see the poll call returned after ~26 seconds while it was passed
> a
> timeout of 1.6s.
>
> To see if this problem was related to poll() I replaced it with select
> calls.
> The app ran for a much longer time after this change., however I eventually
> hit
> the same problem again:
>
> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):ZOO_DEBUG@zookeeper_interest
> @1468:
> Time since last receive: 0ms
> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):ZOO_DEBUG@do_io@306: calling
> select with timeout 1666ms
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@308: out of
> select
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@310: select
> returned with 0 ready fds
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@327: calling
> zookeeper_process
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@329: out of
> zookeeper_process
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_WARN@zookeeper_interest
> @1465:
> Exceeded deadline by 2682ms
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@zookeeper_interest
> @1468:
> Time since last receive: 4348ms
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_WARN@zookeeper_interest
> @1472:
> Nothing received from server since last: 4348ms
> 2010-12-08
> 10:59:18,065:20624(0x7f74abe18910):ZOO_ERROR@handle_socket_error_msg@1540:
> Socket [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
> connection timed out (exceeded timeout by 1015ms)
>
> In the above run, select was stuck for ~5 seconds inspite of passing a
> timeout
> of 1.6s.
>
> I am running this on Ubuntu 10.10 with kernel version 2.6.35. Has anyone
> else
> observed anything like this? I was looking around for any posts related to
> issues with poll/select on linux, but havent found anything.
>
> Thanks for your help!
> Regards,
> Tabrez
>
>
>
--
Henry Robinson
Software Engineer
Cloudera
415-994-6679
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB