|
|
-
hbase slave region server is terminated unexpectedly
lanfengcheng@...) 2012-11-09, 02:37
HI, Recently our HBASE cluster meet a situation that one of our HRegionServer nodes (random one) stop unexpectedly . our HBASE servers are in high concurrent write data and read and write data. At first we thought it may be caused due to configuration problem, we therefore tried the optimization of HBASE parameter . But it still happended after the optimization of the parameters, the process termination of HRegionServer. Please help us and give some suggestions or solutions, thank you. Environment: My HBase culster contains 7 computers. One is the master and zookeeper server, the other 6 are the region servers. The operatiing system is Centos 5.6 with kenerl 2.6.18-238.el5. The jdk version is 1.7.0_03 HBASE VERSION:0.94.2 HADOOP VERSION:Hadoop 1.0.4 PARTS OF HBASE-SITE.XML CONFIGURATIONS: ---------------------------------------------------------------------- <property> <name>hbase.regionserver.lease.period</name> <value>180000</value> </property> <property> <name>hbase.hregion.max.filesize</name> <value>2048000000</value> </property> <property> <name>hbase.hregion.memstore.flush.size</name> <value>512000000</value> </property> <property> <name>zookeeper.session.timeout</name> <value>60000</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>50</value> </property> LOG BEGIN---------------------------------------------------------------------- REGION SERVER ERROR LOG: 2012-11-09 00:00:09,312 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 73.34 MB of total=623.37 MB 2012-11-09 00:00:09,354 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=73.4 MB, total=550.67 MB, single=154.11 MB, multi=463.94 MB, memory=0 KB 2012-11-09 00:00:27,770 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 73.39 MB of total=623.43 MB 2012-11-09 00:00:27,781 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=73.45 MB, total=549.98 MB, single=152.15 MB, multi=465.26 MB, memory=0 KB 2012-11-09 00:00:45,869 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 73.38 MB of total=623.41 MB 2012-11-09 00:00:45,880 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=73.4 MB, total=550.08 MB, single=154.92 MB, multi=462.54 MB, memory=0 KB 2012-11-09 00:00:58,576 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=1.0g 2012-11-09 00:00:58,580 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3. due to global heap pressure 2012-11-09 00:00:58,580 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3., current region memstore size 104.2m 2012-11-09 00:00:58,580 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3., commencing wait for mvcc, flushsize=109299760 2012-11-09 00:00:58,580 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores 2012-11-09 00:00:58,625 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file:hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/.tmp/c7099a89751b4bedb3e9554569169524with permission:rwxrwxrwx 2012-11-09 00:00:58,657 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false] 2012-11-09 00:00:58,657 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/.tmp/c7099a89751b4bedb3e9554569169524: CompoundBloomFilterWriter 2012-11-09 00:00:59,857 INFO org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO DeleteFamily was added to HFile (hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/.tmp/c7099a89751b4bedb3e9554569169524) 2012-11-09 00:00:59,857 INFO org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=248607, memsize=104.2m, into tmp file hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/.tmp/c7099a89751b4bedb3e9554569169524 2012-11-09 00:00:59,864 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/.tmp/c7099a89751b4bedb3e9554569169524 to hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/redirect_info/c7099a89751b4bedb3e9554569169524 2012-11-09 00:00:59,915 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://master:54310/hbase/taobao_redirect_table/9502c1830ad553d1e85497f9445d30d3/redirect_info/c7099a89751b4bedb3e9554569169524, entries=421945, sequenceid=248607, filesize=32.4m 2012-11-09 00:00:59,917 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~104.2m/109299760, currentsize=0.0/0 for region taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3. in 1336ms, sequenceid=248607, compaction requested=true 2012-11-09 00:00:59,917 DEBUG org.apache.hadoop.hbase.regionserver.Store: 9502c1830ad553d1e85497f9445d30d3 - redirect_info: Initiating minor
-
Re: hbase slave region server is terminated unexpectedly
ramkrishna vasudevan 2012-11-09, 08:47
What is your configured heap size? Check your GC logs. Most probabaly a full GC has happened.
Regards Ram
On Fri, Nov 9, 2012 at 8:07 AM, lanfengcheng@xmtr < [EMAIL PROTECTED]> wrote:
> HI, > Recently our HBASE cluster meet a situation that one of our > HRegionServer nodes (random one) stop unexpectedly . > our HBASE servers are in high concurrent write data and read and write > data. > At first we thought it may be caused due to configuration problem, we > therefore tried the optimization of HBASE parameter . > But it still happended after the optimization of the parameters, the > process termination of HRegionServer. > Please help us and give some suggestions or solutions, thank you. > > Environment: > My HBase culster contains 7 computers. One is the master and zookeeper > server, the other 6 are the region servers. > The operatiing system is Centos 5.6 with kenerl 2.6.18-238.el5. > The jdk version is 1.7.0_03 > HBASE VERSION:0.94.2 > HADOOP VERSION:Hadoop 1.0.4 > > PARTS OF HBASE-SITE.XML CONFIGURATIONS: > ---------------------------------------------------------------------- > <property> > <name>hbase.regionserver.lease.period</name> > <value>180000</value> > </property> > <property> > <name>hbase.hregion.max.filesize</name> > <value>2048000000</value> > </property> > <property> > <name>hbase.hregion.memstore.flush.size</name> > <value>512000000</value> > </property> > <property> > <name>zookeeper.session.timeout</name> > <value>60000</value> > </property> > <property> > <name>hbase.regionserver.handler.count</name> > <value>50</value> > </property> > LOG > BEGIN---------------------------------------------------------------------- > REGION SERVER ERROR LOG: > 2012-11-09 00:00:09,312 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > started; Attempting to free 73.34 MB of total=623.37 MB > 2012-11-09 00:00:09,354 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > completed; freed=73.4 MB, total=550.67 MB, single=154.11 MB, multi=463.94 > MB, memory=0 KB > 2012-11-09 00:00:27,770 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > started; Attempting to free 73.39 MB of total=623.43 MB > 2012-11-09 00:00:27,781 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > completed; freed=73.45 MB, total=549.98 MB, single=152.15 MB, multi=465.26 > MB, memory=0 KB > 2012-11-09 00:00:45,869 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > started; Attempting to free 73.38 MB of total=623.41 MB > 2012-11-09 00:00:45,880 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > completed; freed=73.4 MB, total=550.08 MB, single=154.92 MB, multi=462.54 > MB, memory=0 KB > 2012-11-09 00:00:58,576 DEBUG > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up > because memory above low water=1.0g > 2012-11-09 00:00:58,580 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 > \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3. > due to global heap pressure > 2012-11-09 00:00:58,580 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for > taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 > \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3., > current region memstore size 104.2m > 2012-11-09 00:00:58,580 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting > taobao_redirect_table,20121107|\xE9\xA9\xBE\xE9\xA9\xB6\xE8\xAF\x81\xE5\xA5\x97 > \xE6\x83\x85\xE4\xBE\xA3|4,1352378961446.9502c1830ad553d1e85497f9445d30d3., > commencing wait for mvcc, flushsize=109299760 > 2012-11-09 00:00:58,580 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, > commencing flushing stores > 2012-11-09 00:00:58,625 DEBUG org.apache.hadoop.hbase.util.FSUtils:
|
|