Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Zookeeper C client I/O thread delay in poll system call


Copy link to this message
-
Re: Zookeeper C client I/O thread delay in poll system call
Henry Robinson 2010-12-20, 12:57
Hi Tabrez -

Just wondering - are the number of file descriptors that are being polled
going up rapidly? Are you able to capture that data?

Henry

On 8 December 2010 23:42, Tabrez Memon <[EMAIL PROTECTED]> wrote:

> Hi
>
> I am observing the following error in my app based on the C zookeeper
> client.
> The app is completely idle at all times and it has just created a
> connection
> with the zookeeper server running on the same machine with a session
> timeout of
> 5000ms.
>
> 2010-12-04 10:14:45,593:8710(0x7f641136d910):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 25317ms
> 2010-12-04
> 10:14:45,593:8710(0x7f641136d910):ZOO_ERROR@handle_socket_error_msg@1528:
> Socket
> [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
> connection
> timed out (exceeded timeout by 23651ms)
>
> On adding some debug logs, I found that the delay happens in the poll()
> system
> call used in the zookeeper I/O thread (mt_zookeeper.c). I am using a
> session
> timeout of 5 seconds. Thus poll gets called with a timeout of ~1666 ms.
> However
> after running the app for a while, poll gets stuck for a duration much
> higher
> than the specified timeout. Here are the logs which illustrate this
> problem:
>
> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):ZOO_DEBUG@do_io@303: called
> zk
> interest
> 2010-12-04 23:45:50,487:3813(0x7f356fc54910):ZOO_DEBUG@do_io@311: calling
> poll
> with timeout 1666ms
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@313: out of
> poll
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@325: calling
> zookeeper_process
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@327: out of
> zookeeper_process
> 2010-12-04 23:46:16,619:3813(0x7f356fc54910):ZOO_DEBUG@do_io@301: calling
> zk
> interest
>
> As you can see the poll call returned after ~26 seconds while it was passed
> a
> timeout of 1.6s.
>
> To see if this problem was related to poll() I replaced it with select
> calls.
> The app ran for a much longer time after this change., however I eventually
> hit
> the same problem again:
>
> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):ZOO_DEBUG@zookeeper_interest
> @1468:
> Time since last receive: 0ms
> 2010-12-08 10:59:13,718:20624(0x7f74abe18910):ZOO_DEBUG@do_io@306: calling
> select with timeout 1666ms
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@308: out of
> select
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@310: select
> returned with 0 ready fds
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@327: calling
> zookeeper_process
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@do_io@329: out of
> zookeeper_process
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_WARN@zookeeper_interest
> @1465:
> Exceeded deadline by 2682ms
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_DEBUG@zookeeper_interest
> @1468:
> Time since last receive: 4348ms
> 2010-12-08 10:59:18,065:20624(0x7f74abe18910):ZOO_WARN@zookeeper_interest
> @1472:
> Nothing received from server since last: 4348ms
> 2010-12-08
> 10:59:18,065:20624(0x7f74abe18910):ZOO_ERROR@handle_socket_error_msg@1540:
> Socket [127.0.0.1:9876] zk retcode=-7, errno=110(Connection timed out):
> connection timed out (exceeded timeout by 1015ms)
>
> In the above run, select was stuck for ~5 seconds inspite of passing a
> timeout
> of 1.6s.
>
> I am running this on Ubuntu 10.10 with kernel version 2.6.35. Has anyone
> else
> observed anything like this? I was looking around for any posts related to
> issues with poll/select on linux, but havent found anything.
>
> Thanks for your help!
> Regards,
> Tabrez
>
>
>
--
Henry Robinson
Software Engineer
Cloudera
415-994-6679