Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance between HBaseClient scan and HFileReaderV2


Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
Hello St.Ack,

I would like to switch to 0.94 but we are using 0.92.1 and we will not
change until the end of 2014. I can change the "client" of HBase (e.g.
AsyncHBase) if this is the bottleneck. If the problem is server side (e.g.
regionserver), are there anything I can do to improve the performance?

Best Regards,

Jerry
On Thu, Jan 2, 2014 at 11:23 AM, Stack <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
> > Hello HBase users,
> >
> > I just ran a very simple performance test and would like to see if what I
> > experienced make sense.
> >
> > The experiment is as follows:
> > - I filled a hbase region with 700MB data (each row has roughly 45
> columns
> > and the size is 20KB for the entire row)
> > - I configured the region to hold 4GB (therefore no split occurs)
> > - I ran compactions after the data is loaded and make sure that there is
> > only 1 region in the table under test.
> > - No other table exists in the hbase cluster because this is a DEV
> > environment
> > - I'm using HBase 0.92.1
> >
> >
> Can you use a 0.94?  It has had some scanner improvements.
>
> Thanks,
> St.Ack
>
>
>
> > The test is very basic. I use HBaseClient to scan the entire region to
> > retrieve all rows and all columns in the table, just iterating all
> KeyValue
> > pairs until it is done. It took about 1 minute 22 sec to complete. (Note
> > that I disable block cache and uses caching size about 10000).
> >
> > I ran another test using HFileReaderV2 and scan the entire region to
> > retrieve all rows and all columns, just iterating all keyValue pairs
> until
> > it is done. It took 11 sec.
> >
> > The performance difference is dramatic (almost 8 times faster using
> > HFileReaderV2).
> >
> > I want to know why the difference is so big or I didn't configure HBase
> > properly. From this experiment, HDFS can deliver the data efficiently so
> it
> > is not the bottleneck.
> >
> > Any help is appreciated!
> >
> > Jerry
> >
> >
>