|
|
+
Praveen Sripati 2011-06-24, 14:24
+
Doug Cutting 2011-06-24, 14:50
-
Re: Number of bytes per checksumKihwal Lee 2011-06-24, 14:59
Doing CRC32 on a huge data block also reduces its error detection
capability. If you need more information on this topic, this paper will be a good starting poing: http://www.ece.cmu.edu/~koopman/networks/dsn02/dsn02_koopman.pdf Kihwal On 6/24/11 9:50 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > A smaller checksum interval decreases the overhead for random access. > If one seeks to a random location, one must, on average, read and > checksum an extra checksumInterval/2 bytes. 512 was chosen as a value > that, with four-byte CRC32, reduced the impact on small seeks while > increasing the storage and transmission overheads by less than 1%. > > Increasing the interval would not likely reduce the computation > significantly, as the same number of bytes are checksummed regardless, > but it might optimize i/o operations in some cases without harming > random access much if this were increased to 8k or larger. > > Doug > > On 06/24/2011 04:24 PM, Praveen Sripati wrote: >> >> Hi, >> >> Why is the checksum done for io.bytes.per.checksum (defaults to 512) >> instead of the complete block at once (dfs.block.size defaults to >> 67108864)? If a block is corrupt then the entire block has to be >> replicated anyway. Isn't it more efficient to do the checksum for >> complete block at once? >> > |