Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> What cause region server to timeout other than long gc?


Copy link to this message
-
答复: What cause region server to timeout other than long gc?
Maybe you can try to add "-XX:+PrintGCApplicationStoppedTime", then if other ops(not gc) caused the long safepoint duration, you could find the log.
btw, did you have a high load during that time:)

Best,
Liang
________________________________________
发件人: Henry Hung [[EMAIL PROTECTED]]
发送时间: 2013年10月23日 11:41
收件人: [EMAIL PROTECTED]
主题: What cause region server to timeout other than long gc?

Hi All,

Today I have 1 of 9 region servers down because of zookeeper timeout, this is the log:
2013-10-23 07:41:34,139 INFO org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 4 file(s) in cf of MES_PERF_LOG_TIME,\x00\x04\x00\x00\x01A\x9D\xDD\xD9\x8D\x00\x00\x08\xD0fcap2\x00\x00=\xFC,1381922811424.cc6325e7896e3844d124321c315b02b0. into tmpdir=hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp, seqid=2655801231, totalSize=923.8m
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/3ae8adddd77f4a669b93eba2068f5778, keycount=34504344, bloomtype=NONE, size=501.3m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/055361f74fb644f1b3eb48f5da7e3ec0, keycount=18581704, bloomtype=NONE, size=269.5m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/a92cf779dca44843ab4c5130cca68d4f, keycount=9953404, bloomtype=NONE, size=143.9m, encoding=NONE
2013-10-23 07:41:34,139 DEBUG org.apache.hadoop.hbase.regionserver.Compactor: Compacting hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/cf/51b89f361eb64f2f8c894a5c1516c604, keycount=664008, bloomtype=NONE, size=9.2m, encoding=NONE
2013-10-23 07:41:34,182 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp/16a28f8fc9f049c0a8b014502faf6339 with permission=rwxrwxrwx
2013-10-23 07:41:34,190 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
2013-10-23 07:41:34,191 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs://fchddn1.ctfab.com:9000/hbase/MES_PERF_LOG_TIME/cc6325e7896e3844d124321c315b02b0/.tmp/16a28f8fc9f049c0a8b014502faf6339: CompoundBloomFilterWriter
2013-10-23 07:43:16,793 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 51848ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-10-23 07:43:16,799 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60261ms for sessionid 0x1412996eb5d2170, closing socket connection and attempting reconnect
2013-10-23 07:43:16,800 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60184ms for sessionid 0x1412996eb5d217e, closing socket connection and attempting reconnect
2013-10-23 07:43:16,807 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50480,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b17c1aa), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40987","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:16,807 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50159,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@71f59d1a), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40023","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:16,808 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":50480,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@380e6d06), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"10.16.10.181:40963","starttimems":1382485346321,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-10-23 07:43:18,154 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server fchddn1.ctfab.com/10.16.10.181:2222. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-10-23 07:43:18,155 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to fchddn1.ctfab.com/10.16.10.181:2222, initiating session
2013-10-23 07:43:18,201 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-10-23 07:43:18,201 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2013-10-23 07:43:18,230 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-10-23 07:43:18,230 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x1412996eb5d217e has expired, closing socket connection
2013-10-23 07:43:18,804 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server fchddn1.ctfab.com/10.16.10.181:2222. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-10-23 07:43:18,805 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to fchddn1.ctfab.com/10.16.10.181:2222, i
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB