Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> DataBlockScanner scan period


Copy link to this message
-
Re: DataBlockScanner scan period

On Nov 23, 2010, at 7:41 PM, Thanh Do wrote:

> sorry for digging up this old thread.
>
> Brian, is this the reason you want to add a "data-level" scan
> to HDFS, as in HDFS-221.
>
> It seems to me that a very rarely read block could
> be silently corrupted, because the DataBlockScanner
> never finish it scanning job in 3 weeks...
>
>

Why?  What if you restarted your datanode once every 2 weeks?  Last I checked, HDFS randomly assigned blocks to be verified throughout a time interval.  If you have too many blocks and an insufficient time interval, because HDFS also provides a rate limiting feature, you can easily come up with a case where blocks won't get verified.

The reason one wants a data-level scan is if the admin wants to manually verify that all copies of a file are good (well, "good" compared to the checksum... maybe the user corrupted it before uploading it :).  It'd be a great debugging tool to put site admin's minds at easy.

Brian
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB