Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> HDFS dfs.client.read.shortcircuit.skip.checksum


+
jlei liu 2012-09-15, 10:12
Copy link to this message
-
Re: HDFS dfs.client.read.shortcircuit.skip.checksum
Hi LiuLei,

Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.

In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.

My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.

-Todd

On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <[EMAIL PROTECTED]> wrote:

> I read  64k data from file every time.
>>
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB