Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance between HBaseClient scan and HFileReaderV2


Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
There is https://issues.apache.org/jira/browse/HBASE-9272 opened for
un-ordered scans. I see some usecases for that when you scan over multiple
regions but just want to get the result as fast as possible...
2014/1/2 Vladimir Rodionov <[EMAIL PROTECTED]>

> HBase scanner MUST guarantee correct order of KeyValues (coming from
> different HFile's),
> filter condition+ filter condition on included column families and
> qualifiers, time range, max versions and correctly process deleted cells.
> Direct HFileReader does nothing from the above list.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: Jerry Lam [[EMAIL PROTECTED]]
> Sent: Thursday, January 02, 2014 7:56 AM
> To: user
> Subject: Re: Performance between HBaseClient scan and HFileReaderV2
>
> Hi Tom,
>
> Good point. Note that I also ran the HBaseClient performance test several
> times (as you can see from the chart). The caching should also benefit the
> second time I ran the HBaseClient performance test not just benefitting the
> HFileReaderV2 test.
>
> I still don't understand what makes the HBaseClient performs so poorly in
> comparison to access directly HDFS. I can understand maybe a factor of 2
> (even that it is too much) but a factor of 8 is quite unreasonable.
>
> Any hint?
>
> Jerry
>
>
>
> On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood <[EMAIL PROTECTED]> wrote:
>
> > I'm also new to HBase and am not familiar with HFileReaderV2.  However,
> in
> > your description, you didn't mention anything about clearing the linux OS
> > cache between tests.  That might be why you're seeing the big difference
> if
> > you ran the HBaseClient test first, it may have warmed the OS cache and
> > then HFileReaderV2 benefited from it.  Just a guess...
> >
> > -- Tom
> >
> >
> >
> > On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hello HBase users,
> > >
> > > I just ran a very simple performance test and would like to see if
> what I
> > > experienced make sense.
> > >
> > > The experiment is as follows:
> > > - I filled a hbase region with 700MB data (each row has roughly 45
> > columns
> > > and the size is 20KB for the entire row)
> > > - I configured the region to hold 4GB (therefore no split occurs)
> > > - I ran compactions after the data is loaded and make sure that there
> is
> > > only 1 region in the table under test.
> > > - No other table exists in the hbase cluster because this is a DEV
> > > environment
> > > - I'm using HBase 0.92.1
> > >
> > > The test is very basic. I use HBaseClient to scan the entire region to
> > > retrieve all rows and all columns in the table, just iterating all
> > KeyValue
> > > pairs until it is done. It took about 1 minute 22 sec to complete.
> (Note
> > > that I disable block cache and uses caching size about 10000).
> > >
> > > I ran another test using HFileReaderV2 and scan the entire region to
> > > retrieve all rows and all columns, just iterating all keyValue pairs
> > until
> > > it is done. It took 11 sec.
> > >
> > > The performance difference is dramatic (almost 8 times faster using
> > > HFileReaderV2).
> > >
> > > I want to know why the difference is so big or I didn't configure HBase
> > > properly. From this experiment, HDFS can deliver the data efficiently
> so
> > it
> > > is not the bottleneck.
> > >
> > > Any help is appreciated!
> > >
> > > Jerry
> > >
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB