Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Performance between HBaseClient scan and HFileReaderV2


Copy link to this message
-
Performance between HBaseClient scan and HFileReaderV2
Jerry Lam 2013-12-23, 20:18
Hello HBase users,

I just ran a very simple performance test and would like to see if what I
experienced make sense.

The experiment is as follows:
- I filled a hbase region with 700MB data (each row has roughly 45 columns
and the size is 20KB for the entire row)
- I configured the region to hold 4GB (therefore no split occurs)
- I ran compactions after the data is loaded and make sure that there is
only 1 region in the table under test.
- No other table exists in the hbase cluster because this is a DEV
environment
- I'm using HBase 0.92.1

The test is very basic. I use HBaseClient to scan the entire region to
retrieve all rows and all columns in the table, just iterating all KeyValue
pairs until it is done. It took about 1 minute 22 sec to complete. (Note
that I disable block cache and uses caching size about 10000).

I ran another test using HFileReaderV2 and scan the entire region to
retrieve all rows and all columns, just iterating all keyValue pairs until
it is done. It took 11 sec.

The performance difference is dramatic (almost 8 times faster using
HFileReaderV2).

I want to know why the difference is so big or I didn't configure HBase
properly. From this experiment, HDFS can deliver the data efficiently so it
is not the bottleneck.

Any help is appreciated!

Jerry