

Re: Optimizing table scans
I allocate 10GB per RegionServer.
An average row size is ~200 Bytes. The network is 1GB. It would be great if anyone could elaborate on the difference between Cache and Batch parameters. Thanks. On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > How much memory do you have? > What's the size of the underlying row? > What does your network look like? 1GBe or 10GBe? > > There's more to it, and I think that you'll find that YMMV on what is an > optimum scan size... > > HTH > > Mike > > On Sep 12, 2012, at 7:57 AM, Amit Sela <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > I'm trying to find the sweet spot for the cache size and batch size > Scan() > > parameters. > > > > I'm scanning one table using HTable.getScanner() and iterating over the > > ResultScanner retrieved. > > > > I did some testing and got the following results: > > > > For scanning *1000000* rows. > > > > * > > > > Cache > > > > Batch > > > > Total execution time (sec) > > > > 10000 > > > > 1 (default) > > > > 112 > > > > 10000 > > > > 5000 > > > > 110 > > > > 10000 > > > > 10000 > > > > 110 > > > > 10000 > > > > 20000 > > > > 110 > > > > Cache > > > > Batch > > > > Total execution time (sec) > > > > 1000 > > > > 1 (default) > > > > 116 > > > > 10000 > > > > 1 (default) > > > > 110 > > > > 20000 > > > > 1 (default) > > > > 115 > > > > Cache > > > > Batch > > > > Total execution time (sec) > > > > 5000 > > > > 10 > > > > 26 > > > > 20000 > > > > 10 > > > > 25 > > > > 50000 > > > > 10 > > > > 26 > > > > 5000 > > > > 5 > > > > 15 > > > > 20000 > > > > 5 > > > > 14 > > > > 50000 > > > > 5 > > > > 14 > > > > 1000 > > > > 1 > > > > 6 > > > > 5000 > > > > 1 > > > > 5 > > > > 10000 > > > > 1 > > > > 4 > > > > 20000 > > > > 1 > > > > 4 > > > > 50000 > > > > 1 > > > > 4 > > > > * > > *I don't understand why a lower batch size gives such an improvement ?* > > > > Thanks, > > > > Amit. > > * > > * > > 

