Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> All the base region server going down


Copy link to this message
-
All the base region server going down
Dear,

Please help me to find out why all region servers going down at 2013-11-25 16:20.

The logs list below  are logs from master and one slave.
From Master:

2013-11-25 18:06:21,741 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: master,60000,1385363388874.timerUpdater exiting
191757 2013-11-25 18:06:21,755 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@slave10,60020,1385363390188 reported a fatal error:
191758 ABORTING region server slave10,60020,1385363390188: Unrecoverable exception while closing region productdevice,20131122-1-354890041701600,1385348706791.a587f1a15b4a3b10fc0e87       a804487532., still finishing close
191759 Cause:
191760 org.apache.hadoop.hbase.DroppedSnapshotException: region: productdevice,20131122-1-354890041701600,1385348706791.a587f1a15b4a3b10fc0e87a804487532.
191761     at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1605)
191762     at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1479)
191763     at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:992)
191764     at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:956)
191765     at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:119)
191766     at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
191767     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
191768     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
191769     at java.lang.Thread.run(Thread.java:662)
191770 Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Connection reset by peer; Host Details : local host is: "slave10/192.168.1.210"; destination h       ost is: "master":8020;
191771     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
191772     at org.apache.hadoop.ipc.Client.call(Client.java:1241)
191773     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
191774     at $Proxy16.getFileInfo(Unknown Source)
191775     at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
191776     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
191777     at java.lang.reflect.Method.invoke(Method.java:597)
191778     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
191779     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
191780     at $Proxy16.getFileInfo(Unknown Source)
191781     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:629)
191782     at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1545)
191783     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:820)
191784     at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:380)
191785     at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1378)
191786     at org.apache.hadoop.hbase.regionserver.StoreFile$WriterBuilder.build(StoreFile.java:852)
191787     at org.apache.hadoop.hbase.regionserver.Store.createWriterInTmp(Store.java:924)
191788     at org.apache.hadoop.hbase.regionserver.Store.createWriterInTmp(Store.java:904)
191789     at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:805)
191790     at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746)
191791     at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2348)
191792     at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1580)
191793     ... 8 more
191794 Caused by: java.io.IOException: Connection reset by peer
191795     at sun.nio.ch.FileDispatcher.read0(Native Method)
191796     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
191797     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
191798     at sun.nio.ch.IOUtil.read(IOUtil.java:171)
191799     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
191800     at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:56)
191801     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
191803     at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
191804     at java.io.FilterInputStream.read(FilterInputStream.java:116)
191805     at java.io.FilterInputStream.read(FilterInputStream.java:116)
191806     at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:420)
191807     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
191808     at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
191809     at java.io.FilterInputStream.read(FilterInputStream.java:66)
191810     at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:276)
191811     at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
191812     at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
191813     at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
191814     at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
191815     at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:948)
191816     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:846)
191817
191818 2013-11-25 18:06:21,762 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@slave02,60020,1385363390113 reported a fatal error:
191819 ABORTING region server slave02,60020,1385363390113: Unre