Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Optimizing table scans


Copy link to this message
-
Re: Optimizing table scans
Doug Meil 2012-09-12, 14:37

Hi there,

See this for info on the block cache in the RegionServer..

http://hbase.apache.org/book.html
9.6.4. Block Cache

Š and see this for "batching" on the scan parameter...

http://hbase.apache.org/book.html#perf.reading
11.8.1. Scan Caching
On 9/12/12 9:55 AM, "Amit Sela" <[EMAIL PROTECTED]> wrote:

>I allocate 10GB per RegionServer.
>An average row size is ~200 Bytes.
>The network is 1GB.
>
>It would be great if anyone could elaborate on the difference between
>Cache
>and Batch parameters.
>
>Thanks.
>
>On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel
><[EMAIL PROTECTED]>wrote:
>
>> How much memory do you have?
>> What's the size of the underlying row?
>> What does your network look like? 1GBe or 10GBe?
>>
>> There's more to it, and I think that you'll find that YMMV on what is an
>> optimum scan size...
>>
>> HTH
>>
>> -Mike
>>
>> On Sep 12, 2012, at 7:57 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
>>
>> > Hi all,
>> >
>> > I'm trying to find the sweet spot for the cache size and batch size
>> Scan()
>> > parameters.
>> >
>> > I'm scanning one table using HTable.getScanner() and iterating over
>>the
>> > ResultScanner retrieved.
>> >
>> > I did some testing and got the following results:
>> >
>> > For scanning *1000000* rows.
>> >
>> > *
>> >
>> > Cache
>> >
>> > Batch
>> >
>> > Total execution time (sec)
>> >
>> > 10000
>> >
>> > -1 (default)
>> >
>> > 112
>> >
>> > 10000
>> >
>> > 5000
>> >
>> > 110
>> >
>> > 10000
>> >
>> > 10000
>> >
>> > 110
>> >
>> > 10000
>> >
>> > 20000
>> >
>> > 110
>> >
>> > Cache
>> >
>> > Batch
>> >
>> > Total execution time (sec)
>> >
>> > 1000
>> >
>> > -1 (default)
>> >
>> > 116
>> >
>> > 10000
>> >
>> > -1 (default)
>> >
>> > 110
>> >
>> > 20000
>> >
>> > -1 (default)
>> >
>> > 115
>> >
>> > Cache
>> >
>> > Batch
>> >
>> > Total execution time (sec)
>> >
>> > 5000
>> >
>> > 10
>> >
>> > 26
>> >
>> > 20000
>> >
>> > 10
>> >
>> > 25
>> >
>> > 50000
>> >
>> > 10
>> >
>> > 26
>> >
>> > 5000
>> >
>> > 5
>> >
>> > 15
>> >
>> > 20000
>> >
>> > 5
>> >
>> > 14
>> >
>> > 50000
>> >
>> > 5
>> >
>> > 14
>> >
>> > 1000
>> >
>> > 1
>> >
>> > 6
>> >
>> > 5000
>> >
>> > 1
>> >
>> > 5
>> >
>> > 10000
>> >
>> > 1
>> >
>> > 4
>> >
>> > 20000
>> >
>> > 1
>> >
>> > 4
>> >
>> > 50000
>> >
>> > 1
>> >
>> > 4
>> >
>> > *
>> > *I don't understand why a lower batch size gives such an improvement
>>?*
>> >
>> > Thanks,
>> >
>> > Amit.
>> > *
>> > *
>>
>>