Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> All the base region server going down


Copy link to this message
-
All the base region server going down
Dear,

Please help me to find out why all region servers going down at 2013-11-25 16:20.

The logs list below  are logs from master and one slave.
From Master:

2013-11-25 18:06:21,741 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: master,60000,1385363388874.timerUpdater exiting
191757 2013-11-25 18:06:21,755 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@slave10,60020,1385363390188 reported a fatal error:
191758 ABORTING region server slave10,60020,1385363390188: Unrecoverable exception while closing region productdevice,20131122-1-354890041701600,1385348706791.a587f1a15b4a3b10fc0e87       a804487532., still finishing close
191759 Cause:
191760 org.apache.hadoop.hbase.DroppedSnapshotException: region: productdevice,20131122-1-354890041701600,1385348706791.a587f1a15b4a3b10fc0e87a804487532.
191761     at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1605)
191762     at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1479)
191763     at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:992)
191764     at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:956)
191765     at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:119)
191766     at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
191767     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
191768     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
191769     at java.lang.Thread.run(Thread.java:662)
191770 Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Connection reset by peer; Host Details : local host is: "slave10/192.168.1.210"; destination h       ost is: "master":8020;
191771     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
191772     at org.apache.hadoop.ipc.Client.call(Client.java:1241)
191773     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
191774     at $Proxy16.getFileInfo(Unknown Source)
191775     at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
191776     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
191777     at java.lang.reflect.Method.invoke(Method.java:597)
191778     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
191779     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
191780     at $Proxy16.getFileInfo(Unknown Source)
191781     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:629)
191782     at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1545)
191783     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:820)
191784     at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:380)
191785     at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1378)
191786     at org.apache.hadoop.hbase.regionserver.StoreFile$WriterBuilder.build(StoreFile.java:852)
191787     at org.apache.hadoop.hbase.regionserver.Store.createWriterInTmp(Store.java:924)
191788     at org.apache.hadoop.hbase.regionserver.Store.createWriterInTmp(Store.java:904)
191789     at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:805)
191790     at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746)
191791     at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2348)
191792     at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1580)
191793     ... 8 more
191794 Caused by: java.io.IOException: Connection reset by peer
191795     at sun.nio.ch.FileDispatcher.read0(Native Method)
191796     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
191797     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
191798     at sun.nio.ch.IOUtil.read(IOUtil.java:171)
191799     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
191800     at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:56)
191801     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
191803     at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
191804     at java.io.FilterInputStream.read(FilterInputStream.java:116)
191805     at java.io.FilterInputStream.read(FilterInputStream.java:116)
191806     at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:420)
191807     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
191808     at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
191809     at java.io.FilterInputStream.read(FilterInputStream.java:66)
191810     at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:276)
191811     at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
191812     at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
191813     at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
191814     at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
191815     at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:948)
191816     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:846)
191817
191818 2013-11-25 18:06:21,762 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@slave02,60020,1385363390113 reported a fatal error:
191819 ABORTING region server slave02,60020,1385363390113: Unre
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB