Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Optimizing table scans


Copy link to this message
-
Re: Optimizing table scans
I allocate 10GB per RegionServer.
An average row size is ~200 Bytes.
The network is 1GB.

It would be great if anyone could elaborate on the difference between Cache
and Batch parameters.

Thanks.

On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> How much memory do you have?
> What's the size of the underlying row?
> What does your network look like? 1GBe or 10GBe?
>
> There's more to it, and I think that you'll find that YMMV on what is an
> optimum scan size...
>
> HTH
>
> -Mike
>
> On Sep 12, 2012, at 7:57 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > I'm trying to find the sweet spot for the cache size and batch size
> Scan()
> > parameters.
> >
> > I'm scanning one table using HTable.getScanner() and iterating over the
> > ResultScanner retrieved.
> >
> > I did some testing and got the following results:
> >
> > For scanning *1000000* rows.
> >
> > *
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 10000
> >
> > -1 (default)
> >
> > 112
> >
> > 10000
> >
> > 5000
> >
> > 110
> >
> > 10000
> >
> > 10000
> >
> > 110
> >
> > 10000
> >
> > 20000
> >
> > 110
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 1000
> >
> > -1 (default)
> >
> > 116
> >
> > 10000
> >
> > -1 (default)
> >
> > 110
> >
> > 20000
> >
> > -1 (default)
> >
> > 115
> >
> > Cache
> >
> > Batch
> >
> > Total execution time (sec)
> >
> > 5000
> >
> > 10
> >
> > 26
> >
> > 20000
> >
> > 10
> >
> > 25
> >
> > 50000
> >
> > 10
> >
> > 26
> >
> > 5000
> >
> > 5
> >
> > 15
> >
> > 20000
> >
> > 5
> >
> > 14
> >
> > 50000
> >
> > 5
> >
> > 14
> >
> > 1000
> >
> > 1
> >
> > 6
> >
> > 5000
> >
> > 1
> >
> > 5
> >
> > 10000
> >
> > 1
> >
> > 4
> >
> > 20000
> >
> > 1
> >
> > 4
> >
> > 50000
> >
> > 1
> >
> > 4
> >
> > *
> > *I don't understand why a lower batch size gives such an improvement  ?*
> >
> > Thanks,
> >
> > Amit.
> > *
> > *
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB