Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Full table scan from random starting point?

Copy link to this message
Full table scan from random starting point?
Let's say I have one client on each of my regionservers.  Each client needs
to do a full scan on the same table.  The order in which the rows are
scanned by clients does not matter.

Is it possible to have each client start at a random (or better, the first
row located on the local rs) point in the table so that if I start all of
them at once they don't all peg the same rs for reads?

Example (to keep it simple, assume 3 RS):

RS1: rows 1-2
RS2: rows 3-4
RS3: rows 5-6

client1 (on RS1) reads rows: 1, 2, 3, 4, 5, 6
client2 (on RS2) reads rows: 3, 4, 5, 6, 1, 2
client3 (on RS3) reads rows: 5, 6, 1, 2, 3, 4

Obviously they may progress at different rates and still wind up hitting
the same RSs, but at least we can start out a bit more distributed.

Is this easily possible, without first obtaining a list of all rows and
manually batching them up?