Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Full table scan from random starting point?


Copy link to this message
-
Full table scan from random starting point?
Let's say I have one client on each of my regionservers.  Each client needs
to do a full scan on the same table.  The order in which the rows are
scanned by clients does not matter.

Is it possible to have each client start at a random (or better, the first
row located on the local rs) point in the table so that if I start all of
them at once they don't all peg the same rs for reads?

Example (to keep it simple, assume 3 RS):

RS1: rows 1-2
RS2: rows 3-4
RS3: rows 5-6

client1 (on RS1) reads rows: 1, 2, 3, 4, 5, 6
client2 (on RS2) reads rows: 3, 4, 5, 6, 1, 2
client3 (on RS3) reads rows: 5, 6, 1, 2, 3, 4

Obviously they may progress at different rates and still wind up hitting
the same RSs, but at least we can start out a bit more distributed.

Is this easily possible, without first obtaining a list of all rows and
manually batching them up?

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB