Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> [Update] RE: Blocks are getting corrupted under very high load


Copy link to this message
-
[Update] RE: Blocks are getting corrupted under very high load
>_______________________________________
>From: Uma Maheswara Rao G
>Sent: Thursday, November 24, 2011 7:51 AM
>To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Subject: RE: Blocks are getting corrupted under very high load

We could replicate the issue with some test code ( with out hadoop). Issue looks to be same as you pointed.

>Thanks Todd.

>Finally we also started suspecting in that angle. Planned to take the file details before reboot and after reboot.
>With the above analysis i can confirm, whether the same issue or not.

Logs before reboot can not get because that logs are loosing as well :) . this is also a proof.

>One more thing to notice is that the difference between reboot time and last replica finalization is ~1hr in some >cases.
>Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS also that particular thread might not >got the chance to sync the data.
 
same cause.

>great one, HDFS-1539, I have merged all the bugs. Since this is an improvement, issue might not come to my list >:( .

>Also found some OS level configs to do the filesystem operations synchronously
>dirsync
 >   All directory updates within the filesystem should be done synchronously. This affects the following system >calls: creat, link, unlink, symlink, mkdir, rmdir, mknod and rename.
>We suspected mainly the rename operation lost after reboot. Since metafile , blockfile rename should happen >when finalizing the block from BlocksBeingWritten to current. ( at least not considered blocksize).

After the test, we found major performance hit.

>Anyway, thanks a lot for your great & valuable  time  with us here. After checking the above OS logs, i will have a >run with HDFS-1539.

Also performance hit. Presently we are planning to tune the client app for the less threads to reduce the load on OS and also data xceiver count at DN ( currently count is 4096 as Hbase team suggests). Obviously the problem should be rectified.

>Regards,
>Uma

________________________________________
From: Todd Lipcon [[EMAIL PROTECTED]]
Sent: Thursday, November 24, 2011 5:07 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Blocks are getting corrupted under very high load

On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G
<[EMAIL PROTECTED]> wrote:
> Yes, Todd,  block after restart is small and  genstamp also lesser.
>   Here complete machine reboot happend. The boards are configured like, if it is not getting any CPU cycles  for 480secs, it will reboot himself.
>  kernal.hung_task_timeout_secs = 480 sec.

So sounds like the following happened:
- while writing file, the pipeline got reduced down to 1 node due to
timeouts from the other two
- soon thereafter (before more replicas were made), that last replica
kernel-paniced without syncing the data
- on reboot, the filesystem lost some edits from its ext3 journal, and
the block got moved back into the RBW directly, with truncated data
- hdfs did "the right thing" - at least what the algorithms say it
should do, because it had gotten a commitment for a later replica

If you have a build which includes HDFS-1539, you could consider
setting dfs.datanode.synconclose to true, which would have prevented
this problem.

-Todd
--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB