Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Performance between HBaseClient scan and HFileReaderV2


Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
Jerry Lam 2014-01-02, 21:32
Hello Vladimir,

In my use case, I guarantee that a major compaction is executed before any
scan happens because the system we build is a read only system. There will
have no deleted cells. Additionally, I only need to read from a single
column family and therefore I don't need to access multiple HFiles.

Filter conditions are nice to have because if I can read HFile 8x faster
than using HBaseClient, I can do the filter on the client side and still
perform faster than using HBaseClient.

Thank you for your input!

Jerry

On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> HBase scanner MUST guarantee correct order of KeyValues (coming from
> different HFile's),
> filter condition+ filter condition on included column families and
> qualifiers, time range, max versions and correctly process deleted cells.
> Direct HFileReader does nothing from the above list.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: Jerry Lam [[EMAIL PROTECTED]]
> Sent: Thursday, January 02, 2014 7:56 AM
> To: user
> Subject: Re: Performance between HBaseClient scan and HFileReaderV2
>
> Hi Tom,
>
> Good point. Note that I also ran the HBaseClient performance test several
> times (as you can see from the chart). The caching should also benefit the
> second time I ran the HBaseClient performance test not just benefitting the
> HFileReaderV2 test.
>
> I still don't understand what makes the HBaseClient performs so poorly in
> comparison to access directly HDFS. I can understand maybe a factor of 2
> (even that it is too much) but a factor of 8 is quite unreasonable.
>
> Any hint?
>
> Jerry
>
>
>
> On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <[EMAIL PROTECTED]> wrote:
>
> > I'm also new to HBase and am not familiar with HFileReaderV2.  However,
> in
> > your description, you didn't mention anything about clearing the linux OS
> > cache between tests.  That might be why you're seeing the big difference
> if
> > you ran the HBaseClient test first, it may have warmed the OS cache and
> > then HFileReaderV2 benefited from it.  Just a guess...
> >
> > -- Tom
> >
> >
> >
> > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hello HBase users,
> > >
> > > I just ran a very simple performance test and would like to see if
> what I
> > > experienced make sense.
> > >
> > > The experiment is as follows:
> > > - I filled a hbase region with 700MB data (each row has roughly 45
> > columns
> > > and the size is 20KB for the entire row)
> > > - I configured the region to hold 4GB (therefore no split occurs)
> > > - I ran compactions after the data is loaded and make sure that there
> is
> > > only 1 region in the table under test.
> > > - No other table exists in the hbase cluster because this is a DEV
> > > environment
> > > - I'm using HBase 0.92.1
> > >
> > > The test is very basic. I use HBaseClient to scan the entire region to
> > > retrieve all rows and all columns in the table, just iterating all
> > KeyValue
> > > pairs until it is done. It took about 1 minute 22 sec to complete.
> (Note
> > > that I disable block cache and uses caching size about 10000).
> > >
> > > I ran another test using HFileReaderV2 and scan the entire region to
> > > retrieve all rows and all columns, just iterating all keyValue pairs
> > until
> > > it is done. It took 11 sec.
> > >
> > > The performance difference is dramatic (almost 8 times faster using
> > > HFileReaderV2).
> > >
> > > I want to know why the difference is so big or I didn't configure HBase
> > > properly. From this experiment, HDFS can deliver the data efficiently
> so
> > it
> > > is not the bottleneck.
> > >
> > > Any help is appreciated!
> > >
> > > Jerry
> > >
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be