Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Poor HBase random read performance


+
Varun Sharma 2013-06-29, 19:13
+
lars hofhansl 2013-06-29, 22:09
+
lars hofhansl 2013-06-29, 22:24
+
Varun Sharma 2013-06-29, 22:39
+
Varun Sharma 2013-06-29, 23:10
+
Vladimir Rodionov 2013-07-01, 18:08
+
lars hofhansl 2013-07-01, 19:05
+
lars hofhansl 2013-07-01, 19:10
+
Varun Sharma 2013-07-01, 23:10
+
Vladimir Rodionov 2013-07-01, 23:57
+
Vladimir Rodionov 2013-07-02, 00:09
+
Ted Yu 2013-07-01, 23:27
Copy link to this message
-
Re: Poor HBase random read performance
You might also be interested in this benchmark I ran 3 months ago:
https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html

J-D

On Sat, Jun 29, 2013 at 12:13 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I was doing some tests on how good HBase random reads are. The setup is
> consists of a 1 node cluster with dfs replication set to 1. Short circuit
> local reads and HBase checksums are enabled. The data set is small enough
> to be largely cached in the filesystem cache - 10G on a 60G machine.
>
> Client sends out multi-get operations in batches to 10 and I try to measure
> throughput.
>
> Test #1
>
> All Data was cached in the block cache.
>
> Test Time = 120 seconds
> Num Read Ops = 12M
>
> Throughput = 100K per second
>
> Test #2
>
> I disable block cache. But now all the data is in the file system cache. I
> verify this by making sure that IOPs on the disk drive are 0 during the
> test. I run the same test with batched ops.
>
> Test Time = 120 seconds
> Num Read Ops = 0.6M
> Throughput = 5K per second
>
> Test #3
>
> I saw all the threads are now stuck in idLock.lockEntry(). So I now run
> with the lock disabled and the block cache disabled.
>
> Test Time = 120 seconds
> Num Read Ops = 1.2M
> Throughput = 10K per second
>
> Test #4
>
> I re enable block cache and this time hack hbase to only cache Index and
> Bloom blocks but data blocks come from File System cache.
>
> Test Time = 120 seconds
> Num Read Ops = 1.6M
> Throughput = 13K per second
>
> So, I wonder how come such a massive drop in throughput. I know that HDFS
> code adds tremendous overhead but this seems pretty high to me. I use
> 0.94.7 and cdh 4.2.0
>
> Thanks
> Varun
+
Varun Sharma 2013-07-01, 17:50
+
Lars Hofhansl 2013-06-30, 07:45
+
Vladimir Rodionov 2013-07-01, 18:26
+
Varun Sharma 2013-07-01, 18:30