Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Optimizing table scans


Copy link to this message
-
Re: Optimizing table scans
Amit Sela 2012-09-15, 09:11
So just to get it straight. The reason the scan with setBatch(1) is much
much faster is because it returns the only the value for the first column ?

On Wed, Sep 12, 2012 at 5:37 PM, Doug Meil <[EMAIL PROTECTED]>wrote:

>
> Hi there,
>
> See this for info on the block cache in the RegionServer..
>
> http://hbase.apache.org/book.html
> 9.6.4. Block Cache
>
> Š and see this for "batching" on the scan parameter...
>
> http://hbase.apache.org/book.html#perf.reading
> 11.8.1. Scan Caching
>
>
>
>
>
>
> On 9/12/12 9:55 AM, "Amit Sela" <[EMAIL PROTECTED]> wrote:
>
> >I allocate 10GB per RegionServer.
> >An average row size is ~200 Bytes.
> >The network is 1GB.
> >
> >It would be great if anyone could elaborate on the difference between
> >Cache
> >and Batch parameters.
> >
> >Thanks.
> >
> >On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel
> ><[EMAIL PROTECTED]>wrote:
> >
> >> How much memory do you have?
> >> What's the size of the underlying row?
> >> What does your network look like? 1GBe or 10GBe?
> >>
> >> There's more to it, and I think that you'll find that YMMV on what is an
> >> optimum scan size...
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Sep 12, 2012, at 7:57 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I'm trying to find the sweet spot for the cache size and batch size
> >> Scan()
> >> > parameters.
> >> >
> >> > I'm scanning one table using HTable.getScanner() and iterating over
> >>the
> >> > ResultScanner retrieved.
> >> >
> >> > I did some testing and got the following results:
> >> >
> >> > For scanning *1000000* rows.
> >> >
> >> > *
> >> >
> >> > Cache
> >> >
> >> > Batch
> >> >
> >> > Total execution time (sec)
> >> >
> >> > 10000
> >> >
> >> > -1 (default)
> >> >
> >> > 112
> >> >
> >> > 10000
> >> >
> >> > 5000
> >> >
> >> > 110
> >> >
> >> > 10000
> >> >
> >> > 10000
> >> >
> >> > 110
> >> >
> >> > 10000
> >> >
> >> > 20000
> >> >
> >> > 110
> >> >
> >> > Cache
> >> >
> >> > Batch
> >> >
> >> > Total execution time (sec)
> >> >
> >> > 1000
> >> >
> >> > -1 (default)
> >> >
> >> > 116
> >> >
> >> > 10000
> >> >
> >> > -1 (default)
> >> >
> >> > 110
> >> >
> >> > 20000
> >> >
> >> > -1 (default)
> >> >
> >> > 115
> >> >
> >> > Cache
> >> >
> >> > Batch
> >> >
> >> > Total execution time (sec)
> >> >
> >> > 5000
> >> >
> >> > 10
> >> >
> >> > 26
> >> >
> >> > 20000
> >> >
> >> > 10
> >> >
> >> > 25
> >> >
> >> > 50000
> >> >
> >> > 10
> >> >
> >> > 26
> >> >
> >> > 5000
> >> >
> >> > 5
> >> >
> >> > 15
> >> >
> >> > 20000
> >> >
> >> > 5
> >> >
> >> > 14
> >> >
> >> > 50000
> >> >
> >> > 5
> >> >
> >> > 14
> >> >
> >> > 1000
> >> >
> >> > 1
> >> >
> >> > 6
> >> >
> >> > 5000
> >> >
> >> > 1
> >> >
> >> > 5
> >> >
> >> > 10000
> >> >
> >> > 1
> >> >
> >> > 4
> >> >
> >> > 20000
> >> >
> >> > 1
> >> >
> >> > 4
> >> >
> >> > 50000
> >> >
> >> > 1
> >> >
> >> > 4
> >> >
> >> > *
> >> > *I don't understand why a lower batch size gives such an improvement
> >>?*
> >> >
> >> > Thanks,
> >> >
> >> > Amit.
> >> > *
> >> > *
> >>
> >>
>
>
>