Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - "Error recovery for block... failed because recovery from primary datanode failed 6 times"


Copy link to this message
-
RE: "Error recovery for block... failed because recovery from primary datanode failed 6 times"
Jonathan Gray 2011-02-14, 07:08
The DFS errors are after the server aborts.  What is in the log before the server abort?  Doesn't seem to show any reason here which is unusual.

Anything in the master?  Did it time out this RS?  You're running with replication = 1?

> -----Original Message-----
> From: Bradford Stephens [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, February 13, 2011 10:59 PM
> To: [EMAIL PROTECTED]
> Subject: "Error recovery for block... failed because recovery from primary
> datanode failed 6 times"
>
> Hey guys,
>
> I'm occasionally getting regionservers going down (running a late RC of .89
> that Ryan built). 5x c2.xlarge nodes (8gb/6 cores?) on EC2 with EBS drives.
>
> Here's the error message from the RS log. Hadoop fsck shows it's fine.
>
> Any ideas?
>
>
> 2011-02-14 01:51:51,715 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Closed mobile4-
> 2011021,20110122:37b16319-58e8-4809-bca6-83d7598a41dd:E84F9612-CE1A-
> 4FE1-AAE9-
> 2A7AF8C9B2F1:21519,1297657239532.d15ce98030138cad79e248e0845b70ee.
> 2011-02-14 01:51:51,715 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server
> at: ip-10-243-106-63.ec2.internal,60020,1297656774012
> 2011-02-14 01:51:51,711 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionCh
> ecker:
> regionserver60020.majorCompactionChecker exiting
> 2011-02-14 01:51:51,856 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x12e225ef5640002 closed
> 2011-02-14 01:51:51,856 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <ip-10-204-213-153.ec2.internal:/hbase,ip-10-243-106-
> 63.ec2.internal,60020,1297656773719>Closed
> connection with ZooKeeper; /hbase/root-region-server
> 2011-02-14 01:51:58,706 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread
> exiting
> 2011-02-14 01:51:58,706 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
> exiting
> 2011-02-14 01:52:00,031 INFO org.apache.hadoop.hbase.Leases:
> regionserver60020.leaseChecker closing leases
> 2011-02-14 01:52:00,031 INFO org.apache.hadoop.hbase.Leases:
> regionserver60020.leaseChecker closed leases
> 2011-02-14 01:52:00,033 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-
> 10,5,main]
> 2011-02-14 01:52:00,033 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs
> shutdown hook thread.
> 2011-02-14 01:52:00,036 ERROR org.apache.hadoop.hdfs.DFSClient:
> Exception closing file
> /hbase-entest/.logs/ip-10-243-106-
> 63.ec2.internal,60020,1297656774012/10.243.106.63%3A60020.1297660376363
> : java.io.IOException: IOException flush:java.io.IOException:
> IOException flush:java.io.IOException: IOException
> flush:java.io.IOException: Error Recovery for block
> blk_208685344091455182_10263 failed  because recovery from primary
> datanode 10.243.106.63:50010 failed 6 times.  Pipeline was
> 10.243.106.63:50010. Aborting...
> java.io.IOException: IOException flush:java.io.IOException:
> IOException flush:java.io.IOException: IOException
> flush:java.io.IOException: Error Recovery for block
> blk_208685344091455182_10263 failed  because recovery from primary
> datanode 10.243.106.63:50010 failed 6 times.  Pipeline was
> 10.243.106.63:50010. Aborting...
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3
> 214)
> at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
> 97)
> at
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown
> Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Se
> quenceFileLogWriter.java:123)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:906)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.completeCacheFlush(HLog