Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - What cause region server to timeout other than long gc?


+
Henry Hung 2013-10-23, 03:41
+
谢良 2013-10-23, 06:53
+
Henry Hung 2013-10-23, 07:29
+
Samir Ahmic 2013-10-23, 08:45
+
Vladimir Rodionov 2013-10-23, 06:30
Copy link to this message
-
RE: What cause region server to timeout other than long gc?
Henry Hung 2013-10-23, 07:07
I set the vm.swappiness = 0 in /etc/sysctl.conf for every region servers, based on book hbase performance tuning.

How do I check the VM swapping?

My setup is:
CentOS release 6.1 (Final)
Kernel 2.6.32-131.0.15.el6.x86_64 on an x86_64
Hadoop 1.0.4
HBase 0.94.6
HBASE_REGIONSERVER_OPTS="-XX:+UseParNewGC -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -Xmx6000m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=256M -Xloggc:/data1/hadoop/gc-hbase.log"

-----Original Message-----
From: Vladimir Rodionov [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 23, 2013 2:31 PM
To: [EMAIL PROTECTED]
Subject: RE: What cause region server to timeout other than long gc?

VM swapping? Did you check swapping?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Henry Hung [[EMAIL PROTECTED]]
Sent: Tuesday, October 22, 2013 8:41 PM
To: [EMAIL PROTECTED]
Subject: What cause region server to timeout other than long gc?

Hi All,

Today I have 1 of 9 region servers down because of zookeeper timeout, this is the log:
2013-10-23 07:41:34,139 INFO org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 4 file(s) in cf of MES_PERF_LOG_TIME,\x00\x04\x00\x00\x01A\x9D\xDD\xD9\x8D\x00\x00\x08\xD0fcap2\x00\x00=\xFC,1381922811424.cc6325e7896e3844d124321c315b02b0. into tmpdir=hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp, seqid=2655801231, totalSize=923.8m
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/3ae8adddd77f4a669b93eba2068f5778, keycount=34504344, bloomtype=NONE, size=501.3m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/055361f74fb644f1b3eb48f5da7e3ec0, keycount=18581704, bloomtype=NONE, size=269.5m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/a92cf779dca44843ab4c5130cca68d4f, keycount=9953404, bloomtype=NONE, size=143.9m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/51b89f361eb64f2f8c894a5c1516c604, keycount=664008, bloomtype=NONE, size=9.2m, encoding=NONE
2013-10-23 07:41:34,182 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp/16a28f8fc9f049c0a8b014502faf6339 with permission=rwxrwxrwx
2013-10-23 07:41:34,190 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
2013-10-23 07:41:34,191 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp/16a28f8fc9f049c0a8b014502faf6339: CompoundBloomFilterWriter
2013-10-23 07:43:16,793 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 51848ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-10-23 07:43:16,799 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60261ms for sessionid 0x1412996eb5d2170, closing socket connection and attempting reconnect
2013-10-23 07:43:16,800 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60184ms for sessionid 0x1412996eb5d217e, closing socket connection and attempting reconnect
2013-10-23 07:43:16,807 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50480,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b17c1aa), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40987","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:16,807 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50159,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@71f59d1a), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40023","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:16,808 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50480,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@380e6d06), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40963","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:18,154 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server fchddn1.ctfab.com/10.16.10.181:2222. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-10-23 07:43:18,155 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to fchddn1.ctfab.com/10.16.10.181:2222, initiating session
2013-10-23 07:43:18,201 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-10-23 07:43:18,201 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused