Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Performance between HBaseClient scan and HFileReaderV2


+
Jerry Lam 2013-12-23, 20:18
+
Tom Hood 2013-12-30, 02:09
+
Jerry Lam 2014-01-02, 15:56
+
Vladimir Rodionov 2014-01-02, 18:30
+
Jean-Marc Spaggiari 2014-01-02, 18:35
+
Jerry Lam 2014-01-02, 21:32
+
Sergey Shelukhin 2014-01-02, 21:42
+
Sergey Shelukhin 2014-01-02, 21:43
+
Enis Söztutar 2014-01-02, 22:02
+
Jerry Lam 2014-01-02, 23:31
+
Ted Yu 2014-01-02, 23:35
+
lars hofhansl 2014-01-02, 21:45
+
lars hofhansl 2014-01-02, 21:44
+
Jerry Lam 2014-01-02, 23:53
+
Stack 2014-01-02, 16:23
Copy link to this message
-
Re: Performance between HBaseClient scan and HFileReaderV2
Hello St.Ack,

I would like to switch to 0.94 but we are using 0.92.1 and we will not
change until the end of 2014. I can change the "client" of HBase (e.g.
AsyncHBase) if this is the bottleneck. If the problem is server side (e.g.
regionserver), are there anything I can do to improve the performance?

Best Regards,

Jerry
On Thu, Jan 2, 2014 at 11:23 AM, Stack <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
> > Hello HBase users,
> >
> > I just ran a very simple performance test and would like to see if what I
> > experienced make sense.
> >
> > The experiment is as follows:
> > - I filled a hbase region with 700MB data (each row has roughly 45
> columns
> > and the size is 20KB for the entire row)
> > - I configured the region to hold 4GB (therefore no split occurs)
> > - I ran compactions after the data is loaded and make sure that there is
> > only 1 region in the table under test.
> > - No other table exists in the hbase cluster because this is a DEV
> > environment
> > - I'm using HBase 0.92.1
> >
> >
> Can you use a 0.94?  It has had some scanner improvements.
>
> Thanks,
> St.Ack
>
>
>
> > The test is very basic. I use HBaseClient to scan the entire region to
> > retrieve all rows and all columns in the table, just iterating all
> KeyValue
> > pairs until it is done. It took about 1 minute 22 sec to complete. (Note
> > that I disable block cache and uses caching size about 10000).
> >
> > I ran another test using HFileReaderV2 and scan the entire region to
> > retrieve all rows and all columns, just iterating all keyValue pairs
> until
> > it is done. It took 11 sec.
> >
> > The performance difference is dramatic (almost 8 times faster using
> > HFileReaderV2).
> >
> > I want to know why the difference is so big or I didn't configure HBase
> > properly. From this experiment, HDFS can deliver the data efficiently so
> it
> > is not the bottleneck.
> >
> > Any help is appreciated!
> >
> > Jerry
> >
> >
>
+
Andrew Purtell 2014-01-02, 17:47
+
lars hofhansl 2014-01-02, 18:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB