Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Poor HBase random read performance


+
Varun Sharma 2013-06-29, 19:13
+
lars hofhansl 2013-06-29, 22:09
+
lars hofhansl 2013-06-29, 22:24
+
Varun Sharma 2013-06-29, 22:39
+
Varun Sharma 2013-06-29, 23:10
+
Vladimir Rodionov 2013-07-01, 18:08
+
lars hofhansl 2013-07-01, 19:05
+
lars hofhansl 2013-07-01, 19:10
+
Varun Sharma 2013-07-01, 23:10
+
Vladimir Rodionov 2013-07-01, 23:57
+
Vladimir Rodionov 2013-07-02, 00:09
+
Ted Yu 2013-07-01, 23:27
Copy link to this message
-
Re: Poor HBase random read performance
You might also be interested in this benchmark I ran 3 months ago:
https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html

J-D

On Sat, Jun 29, 2013 at 12:13 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I was doing some tests on how good HBase random reads are. The setup is
> consists of a 1 node cluster with dfs replication set to 1. Short circuit
> local reads and HBase checksums are enabled. The data set is small enough
> to be largely cached in the filesystem cache - 10G on a 60G machine.
>
> Client sends out multi-get operations in batches to 10 and I try to measure
> throughput.
>
> Test #1
>
> All Data was cached in the block cache.
>
> Test Time = 120 seconds
> Num Read Ops = 12M
>
> Throughput = 100K per second
>
> Test #2
>
> I disable block cache. But now all the data is in the file system cache. I
> verify this by making sure that IOPs on the disk drive are 0 during the
> test. I run the same test with batched ops.
>
> Test Time = 120 seconds
> Num Read Ops = 0.6M
> Throughput = 5K per second
>
> Test #3
>
> I saw all the threads are now stuck in idLock.lockEntry(). So I now run
> with the lock disabled and the block cache disabled.
>
> Test Time = 120 seconds
> Num Read Ops = 1.2M
> Throughput = 10K per second
>
> Test #4
>
> I re enable block cache and this time hack hbase to only cache Index and
> Bloom blocks but data blocks come from File System cache.
>
> Test Time = 120 seconds
> Num Read Ops = 1.6M
> Throughput = 13K per second
>
> So, I wonder how come such a massive drop in throughput. I know that HDFS
> code adds tremendous overhead but this seems pretty high to me. I use
> 0.94.7 and cdh 4.2.0
>
> Thanks
> Varun
+
Varun Sharma 2013-07-01, 17:50
+
Lars Hofhansl 2013-06-30, 07:45
+
Vladimir Rodionov 2013-07-01, 18:26
+
Varun Sharma 2013-07-01, 18:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB