Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Optimizing table scans


Copy link to this message
-
Re: Optimizing table scans
Michael Segel 2012-09-12, 13:04
How much memory do you have?
What's the size of the underlying row?
What does your network look like? 1GBe or 10GBe?

There's more to it, and I think that you'll find that YMMV on what is an optimum scan size...

HTH

-Mike

On Sep 12, 2012, at 7:57 AM, Amit Sela <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I'm trying to find the sweet spot for the cache size and batch size Scan()
> parameters.
>
> I'm scanning one table using HTable.getScanner() and iterating over the
> ResultScanner retrieved.
>
> I did some testing and got the following results:
>
> For scanning *1000000* rows.
>
> *
>
> Cache
>
> Batch
>
> Total execution time (sec)
>
> 10000
>
> -1 (default)
>
> 112
>
> 10000
>
> 5000
>
> 110
>
> 10000
>
> 10000
>
> 110
>
> 10000
>
> 20000
>
> 110
>
> Cache
>
> Batch
>
> Total execution time (sec)
>
> 1000
>
> -1 (default)
>
> 116
>
> 10000
>
> -1 (default)
>
> 110
>
> 20000
>
> -1 (default)
>
> 115
>
> Cache
>
> Batch
>
> Total execution time (sec)
>
> 5000
>
> 10
>
> 26
>
> 20000
>
> 10
>
> 25
>
> 50000
>
> 10
>
> 26
>
> 5000
>
> 5
>
> 15
>
> 20000
>
> 5
>
> 14
>
> 50000
>
> 5
>
> 14
>
> 1000
>
> 1
>
> 6
>
> 5000
>
> 1
>
> 5
>
> 10000
>
> 1
>
> 4
>
> 20000
>
> 1
>
> 4
>
> 50000
>
> 1
>
> 4
>
> *
> *I don't understand why a lower batch size gives such an improvement  ?*
>
> Thanks,
>
> Amit.
> *
> *