Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> DataBlockScanner scan period

Copy link to this message
Re: DataBlockScanner scan period

On Nov 23, 2010, at 7:41 PM, Thanh Do wrote:

> sorry for digging up this old thread.
> Brian, is this the reason you want to add a "data-level" scan
> to HDFS, as in HDFS-221.
> It seems to me that a very rarely read block could
> be silently corrupted, because the DataBlockScanner
> never finish it scanning job in 3 weeks...

Why?  What if you restarted your datanode once every 2 weeks?  Last I checked, HDFS randomly assigned blocks to be verified throughout a time interval.  If you have too many blocks and an insufficient time interval, because HDFS also provides a rate limiting feature, you can easily come up with a case where blocks won't get verified.

The reason one wants a data-level scan is if the admin wants to manually verify that all copies of a file are good (well, "good" compared to the checksum... maybe the user corrupted it before uploading it :).  It'd be a great debugging tool to put site admin's minds at easy.