Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance between HBaseClient scan and HFileReaderV2


Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:

> Hello HBase users,
>
> I just ran a very simple performance test and would like to see if what I
> experienced make sense.
>
> The experiment is as follows:
> - I filled a hbase region with 700MB data (each row has roughly 45 columns
> and the size is 20KB for the entire row)
> - I configured the region to hold 4GB (therefore no split occurs)
> - I ran compactions after the data is loaded and make sure that there is
> only 1 region in the table under test.
> - No other table exists in the hbase cluster because this is a DEV
> environment
> - I'm using HBase 0.92.1
>
>
Can you use a 0.94?  It has had some scanner improvements.

Thanks,
St.Ack

> The test is very basic. I use HBaseClient to scan the entire region to
> retrieve all rows and all columns in the table, just iterating all KeyValue
> pairs until it is done. It took about 1 minute 22 sec to complete. (Note
> that I disable block cache and uses caching size about 10000).
>
> I ran another test using HFileReaderV2 and scan the entire region to
> retrieve all rows and all columns, just iterating all keyValue pairs until
> it is done. It took 11 sec.
>
> The performance difference is dramatic (almost 8 times faster using
> HFileReaderV2).
>
> I want to know why the difference is so big or I didn't configure HBase
> properly. From this experiment, HDFS can deliver the data efficiently so it
> is not the bottleneck.
>
> Any help is appreciated!
>
> Jerry
>
>