Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> file checksum


+
Rita 2012-06-25, 11:29
+
Kai Voigt 2012-06-25, 11:33
Copy link to this message
-
Re: file checksum
what is the parameter I can use to check more often, like 3 days?

On Mon, Jun 25, 2012 at 7:33 AM, Kai Voigt <[EMAIL PROTECTED]> wrote:

> HDFS has block checksums. Whenever a block is written to the datanodes, a
> checksum is calculated and written with the block to the datanodes' disks.
>
> Whenever a block is requested, the block's checksum is verified against
> the stored checksum. If they don't match, that block is corrupt. But since
> there's
> additional replicas of the block, chances are high one block is matching
> the checksum. Corrupt blocks will be scheduled to be rereplicated.
>
> Also, to prevent bit rod, blocks are checked periodically (weekly by
> default, I believe, you can configure that period) in the background.
>
> Kai
>
> Am 25.06.2012 um 13:29 schrieb Rita:
>
> > Does Hadoop, HDFS in particular, do any sanity checks of the file before
> > and after balancing/copying/reading the files? We have 20TB of data and I
> > want to make sure after these operating are completed the data is still
> in
> > good shape. Where can I read about this?
> >
> > tia
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
>
> --
> Kai Voigt
> [EMAIL PROTECTED]
>
>
>
>
>
--
--- Get your facts first, then you can distort them as you please.--
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB