Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Optimizing table scans


Copy link to this message
-
Optimizing table scans
Hi all,

I'm trying to find the sweet spot for the cache size and batch size Scan()
parameters.

I'm scanning one table using HTable.getScanner() and iterating over the
ResultScanner retrieved.

I did some testing and got the following results:

For scanning *1000000* rows.

*

Cache

Batch

Total execution time (sec)

10000

-1 (default)

112

10000

5000

110

10000

10000

110

10000

20000

110

Cache

Batch

Total execution time (sec)

1000

-1 (default)

116

10000

-1 (default)

110

20000

-1 (default)

115

Cache

Batch

Total execution time (sec)

5000

10

26

20000

10

25

50000

10

26

5000

5

15

20000

5

14

50000

5

14

1000

1

6

5000

1

5

10000

1

4

20000

1

4

50000

1

4

*
*I don't understand why a lower batch size gives such an improvement  ?*

Thanks,

Amit.
*
*