Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance between HBaseClient scan and HFileReaderV2


Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
Hello Lars,

Yes, I used setCaching for getting more KeyValues in each RPC call. Also
yes, when I used HFileReaderV2 I still reading from HDFS. Short circuiting
is enabled but I don't know how to ensure it has been used (Is there log
that can tell me if it has been used?).

I did made sure the HBaseClient runs on the same regionserver that holds
the data.

I just tried asynchbase (as I'm running out of ideas, I started to try
everything), it takes 60 seconds to scan through the data (20 seconds less
than using HBaseClient).

Best Regards,

Jerry

On Thu, Jan 2, 2014 at 4:44 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> From the below I gather you set scanner caching (Scan.setCaching(...))?
> When you use HFileReaderV2, you're still reading from HDFS, right? Are you
> using short circuit reading (avoiding network IO)?
>
> In the HBaseClient client you pipe all the data through the network again.
> Is the HBaseClient located on a different machine?
>
> I would use a profiler (just use jVisualVM, which ships with the JDK and
> use the "sampling" profiler) to see where the time is spent.
>
> Lastly, to echo what other folks have said, 0.92 is pretty old at this
> point and I personally added a lot of performance improvements to HBase
> during the 0.94 timeframe and other's have as well.
> If you could test the same with 0.94, I'd be very interested in the
> numbers.
>
> -- Lars
>
>
>
> ________________________________
>  From: Jerry Lam <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Thursday, January 2, 2014 1:32 PM
> Subject: Re: Performance between HBaseClient scan and HFileReaderV2
>
>
> Hello Vladimir,
>
> In my use case, I guarantee that a major compaction is executed before any
> scan happens because the system we build is a read only system. There will
> have no deleted cells. Additionally, I only need to read from a single
> column family and therefore I don't need to access multiple HFiles.
>
> Filter conditions are nice to have because if I can read HFile 8x faster
> than using HBaseClient, I can do the filter on the client side and still
> perform faster than using HBaseClient.
>
> Thank you for your input!
>
> Jerry
>
>
>
>
> On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
> <[EMAIL PROTECTED]>wrote:
>
> > HBase scanner MUST guarantee correct order of KeyValues (coming from
> > different HFile's),
> > filter condition+ filter condition on included column families and
> > qualifiers, time range, max versions and correctly process deleted cells.
> > Direct HFileReader does nothing from the above list.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: [EMAIL PROTECTED]
> >
> > ________________________________________
> > From: Jerry Lam [[EMAIL PROTECTED]]
> > Sent: Thursday, January 02, 2014 7:56 AM
> > To: user
> > Subject: Re: Performance between HBaseClient scan and HFileReaderV2
> >
> > Hi Tom,
> >
> > Good point. Note that I also ran the HBaseClient performance test several
> > times (as you can see from the chart). The caching should also benefit
> the
> > second time I ran the HBaseClient performance test not just benefitting
> the
> > HFileReaderV2 test.
> >
> > I still don't understand what makes the HBaseClient performs so poorly in
> > comparison to access directly HDFS. I can understand maybe a factor of 2
> > (even that it is too much) but a factor of 8 is quite unreasonable.
> >
> > Any hint?
> >
> > Jerry
> >
> >
> >
> > On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <[EMAIL PROTECTED]> wrote:
> >
> > > I'm also new to HBase and am not familiar with HFileReaderV2.  However,
> > in
> > > your description, you didn't mention anything about clearing the linux
> OS
> > > cache between tests.  That might be why you're seeing the big
> difference
> > if
> > > you ran the HBaseClient test first, it may have warmed the OS cache and
> > > then HFileReaderV2 benefited from it.  Just a guess...
> > >
> > > -- Tom
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB