Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.
In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.
My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.
On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <[EMAIL PROTECTED]> wrote:
> I read 64k data from file every time.
Software Engineer, Cloudera