Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Optimizing table scans


Copy link to this message
-
Optimizing table scans
Amit Sela 2012-09-12, 12:57
Hi all,

I'm trying to find the sweet spot for the cache size and batch size Scan()
parameters.

I'm scanning one table using HTable.getScanner() and iterating over the
ResultScanner retrieved.

I did some testing and got the following results:

For scanning *1000000* rows.

*

Cache

Batch

Total execution time (sec)

10000

-1 (default)

112

10000

5000

110

10000

10000

110

10000

20000

110

Cache

Batch

Total execution time (sec)

1000

-1 (default)

116

10000

-1 (default)

110

20000

-1 (default)

115

Cache

Batch

Total execution time (sec)

5000

10

26

20000

10

25

50000

10

26

5000

5

15

20000

5

14

50000

5

14

1000

1

6

5000

1

5

10000

1

4

20000

1

4

50000

1

4

*
*I don't understand why a lower batch size gives such an improvement  ?*

Thanks,

Amit.
*
*