I'm discovering HBase and comparing it with other distributed database I
know much better. I am currently stressing my testing platform (servers
with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange
performances... I'm putting tons of well-spred data (100 Tables of 100M
rows in a single column family) and then I'm performing random reads. I get
good read performances while the table does not have too much data in it,
but in a big table, I only get around 100/300 qps. I'm not swapping, don't
see any long pauses due to GC and insert rate is still very high, but
nothing come from reads and it often results in a SocketTimeoutException
(while waiting for channel to be ready for read exceptions, etc.).
I noticed that certain StoreFile were very big (~120 GB) and I adjusted
compaction strategy to no compact such big files (I don't know if it can be
related to my issue).
I noticed that when I'm stressing my cluster with Get requests, everything
*looks* fine until a RegionServer does not yield a data locally and fetch
it from HDFS, resulting in high and long network use, more than 60 seconds,
that's throwing SocketTimeoutException).
How does HBase handle data locality for random accesses ? Could it be a
lead to solve this kind of issue ?
My block cache of 5 GB is not full at all...