Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Data loss on EMR cluster running Hadoop and Hive


Copy link to this message
-
Re: Data loss on EMR cluster running Hadoop and Hive
Michael Segel 2012-09-04, 16:43
Next time, try reading and writing to S3 directly from your hive job.

Not sure why the block was bad... What did the AWS folks have to say?

-Mike

On Sep 4, 2012, at 11:30 AM, Max Hansmire <[EMAIL PROTECTED]> wrote:

> I ran into an issue yesterday where one of the blocks on HDFS seems to
> have gone away. I would appreciate any help that you can provide.
>
> I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running
> hadoop version 0.20.205 and hive version 0.8.1.
>
> I have a hive table that is written out in the reduce step of a map
> reduce job created by hive. This step completed with no errors, but
> the next map-reduce job that tries to read it failed with the
> following error message.
>
> "Caused by: java.io.IOException: No live nodes contain current block"
>
> I ran hadoop fs -cat on the same file and got the same error.
>
> Looking more closely at the data and name node logs, I see this error
> for the same problem block. It is in the name node when trying to read
> the data.
>
> 2012-09-03 11:56:05,054 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
> DatanodeRegistration(10.193.39.159:9200,
> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
> infoPort=9102, ipcPort=9201):sendBlock() :  Offset 134217727 and
> length 1 don't match block blk_-7100869813617535842_5426 ( blockLen
> 120152064 )
> 2012-09-03 11:56:05,054 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
> DatanodeRegistration(10.193.39.159:9200,
> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
> infoPort=9102, ipcPort=9201):Got exception while serving
> blk_-7100869813617535842_5426 to /10.96.57.112:
> java.io.IOException:  Offset 134217727 and length 1 don't match block
> blk_-7100869813617535842_5426 ( blockLen 120152064 )
> at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> at java.lang.Thread.run(Thread.java:662)
>
> 2012-09-03 11:56:05,054 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
> DatanodeRegistration(10.193.39.159:9200,
> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
> infoPort=9102, ipcPort=9201):DataXceiver
> java.io.IOException:  Offset 134217727 and length 1 don't match block
> blk_-7100869813617535842_5426 ( blockLen 120152064 )
> at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> at java.lang.Thread.run(Thread.java:662)
>
> Unfortunately the EMR cluster that had the data on it has since been
> terminated. I have access to the logs, but I can't run an fsck. I can
> provide more detailed stack traces etc. if you think it would be
> helpful. Rerunning my process by re-generating the corrupted block
> resolved the issue.
>
> Would really appreciate if anyone has a reasonable explanation of what
> happened and how to avoid in the future.
>
> Max
>