Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Number of bytes per checksum


+
Praveen Sripati 2011-06-24, 14:24
Copy link to this message
-
Re: Number of bytes per checksum
A smaller checksum interval decreases the overhead for random access.
If one seeks to a random location, one must, on average, read and
checksum an extra checksumInterval/2 bytes.  512 was chosen as a value
that, with four-byte CRC32, reduced the impact on small seeks while
increasing the storage and transmission overheads by less than 1%.

Increasing the interval would not likely reduce the computation
significantly, as the same number of bytes are checksummed regardless,
but it might optimize i/o operations in some cases without harming
random access much if this were increased to 8k or larger.

Doug

On 06/24/2011 04:24 PM, Praveen Sripati wrote:
>
> Hi,
>
> Why is the checksum done for io.bytes.per.checksum (defaults to 512)
> instead of the complete block at once (dfs.block.size defaults to
> 67108864)? If a block is corrupt then the entire block has to be
> replicated anyway. Isn't it more efficient to do the checksum for
> complete block at once?
>
+
Kihwal Lee 2011-06-24, 14:59
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB