Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Unscientific comparison of fully-cached zipfian reading


Copy link to this message
-
Re: Unscientific comparison of fully-cached zipfian reading
I just did a similar test using PE on a test cluster (16 DNs/RSs, 158 mappers).
I set it up such that the data does not fit into the aggregate block cache, but does fit into the aggregate OS buffer cache, in my case that turned out to be 100m 1k rows.
Now I ran the SequentialRead and RandomRead tests.

In both cases I see no disk activity (since the data fits into the OS cache). The SequentialRead run finishes in about 7mins, whereas the RandomRead run takes over 34mins.
This is with CDH4.2.1 and HBase 0.94.7 compiled against it and with SCR enabled.

The only difference is that in the SequentialRead case it is likely that the next Get can still use the previously cached block, whereas in the RandomRead read almost every Get need to fetch a block from the OS cache (as verified by the cache miss rate, which is roughly the same as the request count per RegionServer). Except for enabling SCR all other settings are close to the defaults.

I see 2000-4000 req/s/regionserver and the same number of cache missed per second and RegionServer in the RandomRead, meaning each RegionServer brought in about 125-200mb/s from the OS cache, which seems a tad low.
So this would imply that reading from the OS cache is almost 5x slower than reading from the block cache. It would be interesting to explore the discrepancy.
-- Lars

________________________________
 From: Jean-Daniel Cryans <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Wednesday, April 24, 2013 6:01 PM
Subject: Unscientific comparison of fully-cached zipfian reading
 

Hey guys,

I did a little benchmarking to see what kind of numbers we get from the
block cache and the OS cache. Please see:

https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html

Hopefully it gives you some ballpark numbers for further discussion.

J-D
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB