Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Zookeeper session timeouts during RAID Checks


Copy link to this message
-
Re: Zookeeper session timeouts during RAID Checks
Hi Srikanth. Do you see any of these in your server logs?

                    LOG.warn("fsync-ing the write ahead log in "
                            + Thread.currentThread().getName()
                            + " took " + syncElapsedMS
                            + "ms which will adversely effect
operation latency. "
                            + "See the ZooKeeper troubleshooting guide");

Patrick

On Mon, Oct 7, 2013 at 11:45 PM, Srikanth R <[EMAIL PROTECTED]> wrote:
> hi zookeepers,
>
>  I am using zookeeper 3.4.5 in a 3 server ensemble mode. And its datadir is
> in a dedicated 6 disk 2.5TB  Raid10 Volume. Only HDFS namenode/journal txns
> and Zookeeper txnlog/snapshots are written to this volume. The issue is
> whenever the weekly raid check is running, clients that have 5 Sec Timeouts
> are timing out randomly. Has anyone seen issues like this with datadir on
> Raid before ?
>
> Also there isnt much writes going into ZK, only hadoop-ha and hbase master
> are using the ZK services.
>
> 1. There are no cpu bottlenecks or memory/swapping issues on the boxes.
> 2.  In ZK strace output, there are a few random 2-3 secs intervals where no
> system calls are recorded, which is weird. And most of the timeouts
> correspond to this time period. But not able to figure out what ZK does
> during that intervals.
> 3. Enabled GC logs, no traces of full GC during timeouts. Though there were
> full GCs recorded over period of time, the pause is only for 0.3-0.4 secs.
> Also tried the ConcMarkSweep GC without any improvement.
> 4. There are not network errors/timeouts.
> 5. At times I see a max latency of 3-4 secs in connection stats, but avg
> and min latency are 0.
> 6. ran zk-latencies.py and latency seems to be same with and without raid
> check.
>
> Here's the zookeeper config
>
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/data/zookeeper
> clientPort=2181
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> server.1=xyz1:2888:3888
> server.2=xyz2:2888:3888
> server.3=xyz3:2888:3888
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> jaasLoginRenew=3600000
> kerberos.removeHostFromPrincipal=true
>
> Partition:
>
> -bash-4.1$ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/md2              116G   79G   32G  72% /
> tmpfs                  12G     0   12G   0% /dev/shm
> /dev/md0               97M   31M   61M  34% /boot
> /dev/md3              2.6T  297M  2.5T   1% /data
>
> -bash-4.1$ cat /proc/mdstat
> Personalities : [raid10] [raid1]
> md3 : active raid10 sdc5[2] sdd5[3] sda5[0] sdf5[5] sdb5[1] sde5[4]
>       2782511616 blocks super 1.1 512K chunks 2 near-copies [6/6] [UUUUUU]
>       [===================>.]  check = 95.3% (2654099584/2782511616)
> finish=41.5min speed=51516K/sec
>       bitmap: 0/21 pages [0KB], 65536KB chunk
>
> Here are my queries,
> 1. what is the best way to find out what the Zookeeper threads are doing
> (strace hasnt helped much)
> 2. There isnt much data written to/read from ZK. why would ZK fail ?
> 3. Is it possible to trace all the requests that come in to ZK ?
>
> Please let me know if you need more info. Any help is greatly appreciated.
>
> Thanks.
> Srikanth
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB