RE: Blocks are getting corrupted under very high load
Thanks Todd.

Finally we also started suspecting in that angle. Planned to take the file details before reboot and after reboot.
With the above analysis i can confirm, whether the same issue or not.

One more thing to notice is that the difference between reboot time and last replica finalization is ~1hr in some cases.
Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS also that particular thread might not got the chance to sync the data.

great one, HDFS-1539, I have merged all the bugs. Since this is an improvement, issue might not come to my list :( .

Also found some OS level configs to do the filesystem operations synchronously
    All directory updates within the filesystem should be done synchronously. This affects the following system calls: creat, link, unlink, symlink, mkdir, rmdir, mknod and rename.
We suspected mainly the rename operation lost after reboot. Since metafile , blockfile rename should happen when finalizing the block from BBW to current. ( at least not considered blocksize).

Anyway, thanks a lot for your great & valuable  time  with us here. After checking the above OS logs, i will have a run with HDFS-1539.

From: Todd Lipcon [[EMAIL PROTECTED]]
Sent: Thursday, November 24, 2011 5:07 AM
Subject: Re: Blocks are getting corrupted under very high load

On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G
> Yes, Todd,  block after restart is small and  genstamp also lesser.
>   Here complete machine reboot happend. The boards are configured like, if it is not getting any CPU cycles  for 480secs, it will reboot himself.
>  kernal.hung_task_timeout_secs = 480 sec.

So sounds like the following happened:
- while writing file, the pipeline got reduced down to 1 node due to
timeouts from the other two
- soon thereafter (before more replicas were made), that last replica
kernel-paniced without syncing the data
- on reboot, the filesystem lost some edits from its ext3 journal, and
the block got moved back into the RBW directly, with truncated data
- hdfs did "the right thing" - at least what the algorithms say it
should do, because it had gotten a commitment for a later replica

If you have a build which includes HDFS-1539, you could consider
setting dfs.datanode.synconclose to true, which would have prevented
this problem.

