Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Data loss on EMR cluster running Hadoop and Hive


Copy link to this message
-
Re: Data loss on EMR cluster running Hadoop and Hive
Next time, try reading and writing to S3 directly from your hive job.

Not sure why the block was bad... What did the AWS folks have to say?

-Mike

On Sep 4, 2012, at 11:30 AM, Max Hansmire <[EMAIL PROTECTED]> wrote:

> I ran into an issue yesterday where one of the blocks on HDFS seems to
> have gone away. I would appreciate any help that you can provide.
>
> I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running
> hadoop version 0.20.205 and hive version 0.8.1.
>
> I have a hive table that is written out in the reduce step of a map
> reduce job created by hive. This step completed with no errors, but
> the next map-reduce job that tries to read it failed with the
> following error message.
>
> "Caused by: java.io.IOException: No live nodes contain current block"
>
> I ran hadoop fs -cat on the same file and got the same error.
>
> Looking more closely at the data and name node logs, I see this error
> for the same problem block. It is in the name node when trying to read
> the data.
>
> 2012-09-03 11:56:05,054 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
> DatanodeRegistration(10.193.39.159:9200,
> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
> infoPort=9102, ipcPort=9201):sendBlock() :  Offset 134217727 and
> length 1 don't match block blk_-7100869813617535842_5426 ( blockLen
> 120152064 )
> 2012-09-03 11:56:05,054 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
> DatanodeRegistration(10.193.39.159:9200,
> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
> infoPort=9102, ipcPort=9201):Got exception while serving
> blk_-7100869813617535842_5426 to /10.96.57.112:
> java.io.IOException:  Offset 134217727 and length 1 don't match block
> blk_-7100869813617535842_5426 ( blockLen 120152064 )
> at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> at java.lang.Thread.run(Thread.java:662)
>
> 2012-09-03 11:56:05,054 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0):
> DatanodeRegistration(10.193.39.159:9200,
> storageID=DS-2147477684-10.193.39.159-9200-1346659207926,
> infoPort=9102, ipcPort=9201):DataXceiver
> java.io.IOException:  Offset 134217727 and length 1 don't match block
> blk_-7100869813617535842_5426 ( blockLen 120152064 )
> at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> at java.lang.Thread.run(Thread.java:662)
>
> Unfortunately the EMR cluster that had the data on it has since been
> terminated. I have access to the logs, but I can't run an fsck. I can
> provide more detailed stack traces etc. if you think it would be
> helpful. Rerunning my process by re-generating the corrupted block
> resolved the issue.
>
> Would really appreciate if anyone has a reasonable explanation of what
> happened and how to avoid in the future.
>
> Max
>
+
Max Hansmire 2012-09-04, 17:08