I published a few more numbers after talking to Stack, Elliott, and Todd
(thanks guys). It's the same link BTW.
First, it's interesting that the block cache is slower than direct access
without CRC to the OS buffer. One thing is that the latter still stores the
meta blocks in the BC, so you're still hitting it. So I ran a "pure" OS
buffer + SRC + no CRC test, so the BC is completely disabled, and, well,
turns out that it's slower than the pure BC test. Interesting!
Second, we thought that the BC might scale badly with a lot of blocks so I
tried swapping our concurrent hash map with Cliff Click's drop-in
replacement. Turns out that it is slower than the java CHM at that scale (8
threads hitting 9 machines). I did also a test with 80 threads but it was
still slower (4.2ms for Cliff Click VS 3.8ms for Java).
Third, I ran a test with checksumming inside HBase with OS buffer + SCR and
disabled HDFS checksumming. Keep in mind that HBase uses PureCRC32 whereas
HDFS will use faster native SSE4 calls. The result is that it was about
300us faster to checksum in HBase even if the checksumming itself is
slower. Less OS calls means much greater speed?
It seems to me that people running in production with a Hadoop version that
has PureCRC32 (Hadoop 1.1.x, 2.0) will benefit from using HBase checksums.
We also agreed that all those numbers can be improved. HBase could use the
native checksumming for example. The block cache could also be profiled.
Anyone interested in the above might want to do micro benchmarks instead of
the macro testing I did to understand what exactly needs improving.
On Wed, Apr 24, 2013 at 6:01 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> Hey guys,
> I did a little benchmarking to see what kind of numbers we get from the
> block cache and the OS cache. Please see:
> Hopefully it gives you some ballpark numbers for further discussion.