Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Optimizing table scans


Copy link to this message
-
Re: Optimizing table scans
Amit Sela 2012-09-12, 13:55
I allocate 10GB per RegionServer.
An average row size is ~200 Bytes.
The network is 1GB.

It would be great if anyone could elaborate on the difference between Cache
and Batch parameters.

Thanks.

On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> How much memory do you have?
> What's the size of the underlying row?
> What does your network look like? 1GBe or 10GBe?
>
> There's more to it, and I think that you'll find that YMMV on what is an
> optimum scan size...
>
> HTH
>
> -Mike
>
> On Sep 12, 2012, at 7:57 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > I'm trying to find the sweet spot for the cache size and batch size
> Scan()
> > parameters.
> >
> > I'm scanning one table using HTable.getScanner() and iterating over the
> > ResultScanner retrieved.
> >
> > I did some testing and got the following results:
> >
> > For scanning *1000000* rows.
> >
> > *
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 10000
> >
> > -1 (default)
> >
> > 112
> >
> > 10000
> >
> > 5000
> >
> > 110
> >
> > 10000
> >
> > 10000
> >
> > 110
> >
> > 10000
> >
> > 20000
> >
> > 110
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 1000
> >
> > -1 (default)
> >
> > 116
> >
> > 10000
> >
> > -1 (default)
> >
> > 110
> >
> > 20000
> >
> > -1 (default)
> >
> > 115
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 5000
> >
> > 10
> >
> > 26
> >
> > 20000
> >
> > 10
> >
> > 25
> >
> > 50000
> >
> > 10
> >
> > 26
> >
> > 5000
> >
> > 5
> >
> > 15
> >
> > 20000
> >
> > 5
> >
> > 14
> >
> > 50000
> >
> > 5
> >
> > 14
> >
> > 1000
> >
> > 1
> >
> > 6
> >
> > 5000
> >
> > 1
> >
> > 5
> >
> > 10000
> >
> > 1
> >
> > 4
> >
> > 20000
> >
> > 1
> >
> > 4
> >
> > 50000
> >
> > 1
> >
> > 4
> >
> > *
> > *I don't understand why a lower batch size gives such an improvement  ?*
> >
> > Thanks,
> >
> > Amit.
> > *
> > *
>
>