HBase, mail # user - Full table scan from random starting point? - 2014-01-31, 22:17
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Full table scan from random starting point?
Let's say I have one client on each of my regionservers.  Each client needs
to do a full scan on the same table.  The order in which the rows are
scanned by clients does not matter.

Is it possible to have each client start at a random (or better, the first
row located on the local rs) point in the table so that if I start all of
them at once they don't all peg the same rs for reads?

Example (to keep it simple, assume 3 RS):

RS1: rows 1-2
RS2: rows 3-4
RS3: rows 5-6

client1 (on RS1) reads rows: 1, 2, 3, 4, 5, 6
client2 (on RS2) reads rows: 3, 4, 5, 6, 1, 2
client3 (on RS3) reads rows: 5, 6, 1, 2, 3, 4

Obviously they may progress at different rates and still wind up hitting
the same RSs, but at least we can start out a bit more distributed.

Is this easily possible, without first obtaining a list of all rows and
manually batching them up?

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB