Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> What cause region server to timeout other than long gc?


Copy link to this message
-
RE: What cause region server to timeout other than long gc?
I set the vm.swappiness = 0 in /etc/sysctl.conf for every region servers, based on book hbase performance tuning.

How do I check the VM swapping?

My setup is:
CentOS release 6.1 (Final)
Kernel 2.6.32-131.0.15.el6.x86_64 on an x86_64
Hadoop 1.0.4
HBase 0.94.6
HBASE_REGIONSERVER_OPTS="-XX:+UseParNewGC -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -Xmx6000m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=256M -Xloggc:/data1/hadoop/gc-hbase.log"

-----Original Message-----
From: Vladimir Rodionov [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 23, 2013 2:31 PM
To: [EMAIL PROTECTED]
Subject: RE: What cause region server to timeout other than long gc?

VM swapping? Did you check swapping?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Henry Hung [[EMAIL PROTECTED]]
Sent: Tuesday, October 22, 2013 8:41 PM
To: [EMAIL PROTECTED]
Subject: What cause region server to timeout other than long gc?

Hi All,

Today I have 1 of 9 region servers down because of zookeeper timeout, this is the log:
2013-10-23 07:41:34,139 INFO org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 4 file(s) in cf of MES_PERF_LOG_TIME,\x00\x04\x00\x00\x01A\x9D\xDD\xD9\x8D\x00\x00\x08\xD0fcap2\x00\x00=\xFC,1381922811424.cc6325e7896e3844d124321c315b02b0. into tmpdir=hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp, seqid=2655801231, totalSize=923.8m
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/3ae8adddd77f4a669b93eba2068f5778, keycount=34504344, bloomtype=NONE, size=501.3m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/055361f74fb644f1b3eb48f5da7e3ec0, keycount=18581704, bloomtype=NONE, size=269.5m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/a92cf779dca44843ab4c5130cca68d4f, keycount=9953404, bloomtype=NONE, size=143.9m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/51b89f361eb64f2f8c894a5c1516c604, keycount=664008, bloomtype=NONE, size=9.2m, encoding=NONE
2013-10-23 07:41:34,182 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp/16a28f8fc9f049c0a8b014502faf6339 with permission=rwxrwxrwx
2013-10-23 07:41:34,190 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
2013-10-23 07:41:34,191 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp/16a28f8fc9f049c0a8b014502faf6339: CompoundBloomFilterWriter
2013-10-23 07:43:16,793 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 51848ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-10-23 07:43:16,799 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60261ms for sessionid 0x1412996eb5d2170, closing socket connection and attempting reconnect
2013-10-23 07:43:16,800 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60184ms for sessionid 0x1412996eb5d217e, closing socket connection and attempting reconnect
2013-10-23 07:43:16,807 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50480,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b17c1aa), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40987","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:16,807 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50159,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@71f59d1a), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40023","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:16,808 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50480,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@380e6d06), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40963","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:18,154 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server fchddn1.ctfab.com/10.16.10.181:2222. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-10-23 07:43:18,155 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to fchddn1.ctfab.com/10.16.10.181:2222, initiating session
2013-10-23 07:43:18,201 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-10-23 07:43:18,201 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB