Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Performance between HBaseClient scan and HFileReaderV2


+
Jerry Lam 2013-12-23, 20:18
+
Tom Hood 2013-12-30, 02:09
+
Jerry Lam 2014-01-02, 15:56
+
Vladimir Rodionov 2014-01-02, 18:30
+
Jean-Marc Spaggiari 2014-01-02, 18:35
+
Jerry Lam 2014-01-02, 21:32
+
Sergey Shelukhin 2014-01-02, 21:42
+
Sergey Shelukhin 2014-01-02, 21:43
+
Enis Söztutar 2014-01-02, 22:02
+
Jerry Lam 2014-01-02, 23:31
+
Ted Yu 2014-01-02, 23:35
+
lars hofhansl 2014-01-02, 21:45
+
lars hofhansl 2014-01-02, 21:44
Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
Hello Lars,

Yes, I used setCaching for getting more KeyValues in each RPC call. Also
yes, when I used HFileReaderV2 I still reading from HDFS. Short circuiting
is enabled but I don't know how to ensure it has been used (Is there log
that can tell me if it has been used?).

I did made sure the HBaseClient runs on the same regionserver that holds
the data.

I just tried asynchbase (as I'm running out of ideas, I started to try
everything), it takes 60 seconds to scan through the data (20 seconds less
than using HBaseClient).

Best Regards,

Jerry

On Thu, Jan 2, 2014 at 4:44 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> From the below I gather you set scanner caching (Scan.setCaching(...))?
> When you use HFileReaderV2, you're still reading from HDFS, right? Are you
> using short circuit reading (avoiding network IO)?
>
> In the HBaseClient client you pipe all the data through the network again.
> Is the HBaseClient located on a different machine?
>
> I would use a profiler (just use jVisualVM, which ships with the JDK and
> use the "sampling" profiler) to see where the time is spent.
>
> Lastly, to echo what other folks have said, 0.92 is pretty old at this
> point and I personally added a lot of performance improvements to HBase
> during the 0.94 timeframe and other's have as well.
> If you could test the same with 0.94, I'd be very interested in the
> numbers.
>
> -- Lars
>
>
>
> ________________________________
>  From: Jerry Lam <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Thursday, January 2, 2014 1:32 PM
> Subject: Re: Performance between HBaseClient scan and HFileReaderV2
>
>
> Hello Vladimir,
>
> In my use case, I guarantee that a major compaction is executed before any
> scan happens because the system we build is a read only system. There will
> have no deleted cells. Additionally, I only need to read from a single
> column family and therefore I don't need to access multiple HFiles.
>
> Filter conditions are nice to have because if I can read HFile 8x faster
> than using HBaseClient, I can do the filter on the client side and still
> perform faster than using HBaseClient.
>
> Thank you for your input!
>
> Jerry
>
>
>
>
> On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
> <[EMAIL PROTECTED]>wrote:
>
> > HBase scanner MUST guarantee correct order of KeyValues (coming from
> > different HFile's),
> > filter condition+ filter condition on included column families and
> > qualifiers, time range, max versions and correctly process deleted cells.
> > Direct HFileReader does nothing from the above list.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: [EMAIL PROTECTED]
> >
> > ________________________________________
> > From: Jerry Lam [[EMAIL PROTECTED]]
> > Sent: Thursday, January 02, 2014 7:56 AM
> > To: user
> > Subject: Re: Performance between HBaseClient scan and HFileReaderV2
> >
> > Hi Tom,
> >
> > Good point. Note that I also ran the HBaseClient performance test several
> > times (as you can see from the chart). The caching should also benefit
> the
> > second time I ran the HBaseClient performance test not just benefitting
> the
> > HFileReaderV2 test.
> >
> > I still don't understand what makes the HBaseClient performs so poorly in
> > comparison to access directly HDFS. I can understand maybe a factor of 2
> > (even that it is too much) but a factor of 8 is quite unreasonable.
> >
> > Any hint?
> >
> > Jerry
> >
> >
> >
> > On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <[EMAIL PROTECTED]> wrote:
> >
> > > I'm also new to HBase and am not familiar with HFileReaderV2.  However,
> > in
> > > your description, you didn't mention anything about clearing the linux
> OS
> > > cache between tests.  That might be why you're seeing the big
> difference
> > if
> > > you ran the HBaseClient test first, it may have warmed the OS cache and
> > > then HFileReaderV2 benefited from it.  Just a guess...
> > >
> > > -- Tom
+
Stack 2014-01-02, 16:23
+
Jerry Lam 2014-01-02, 17:18
+
Andrew Purtell 2014-01-02, 17:47
+
lars hofhansl 2014-01-02, 18:54