Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Full table scan from random starting point?


Copy link to this message
-
Re: Full table scan from random starting point?
Hi Robert,

You can randomly build your start key, give it to your scanner, scan until
the end of the table, then give it as the end key for a new scanner. Doing
that you will scan the way you are looking for.

Also, this might interest you:
https://issues.apache.org/jira/browse/HBASE-9272

JM
2014-01-31 Robert Dyer <[EMAIL PROTECTED]>:

> Let's say I have one client on each of my regionservers.  Each client needs
> to do a full scan on the same table.  The order in which the rows are
> scanned by clients does not matter.
>
> Is it possible to have each client start at a random (or better, the first
> row located on the local rs) point in the table so that if I start all of
> them at once they don't all peg the same rs for reads?
>
> Example (to keep it simple, assume 3 RS):
>
> RS1: rows 1-2
> RS2: rows 3-4
> RS3: rows 5-6
>
> client1 (on RS1) reads rows: 1, 2, 3, 4, 5, 6
> client2 (on RS2) reads rows: 3, 4, 5, 6, 1, 2
> client3 (on RS3) reads rows: 5, 6, 1, 2, 3, 4
>
> Obviously they may progress at different rates and still wind up hitting
> the same RSs, but at least we can start out a bit more distributed.
>
> Is this easily possible, without first obtaining a list of all rows and
> manually batching them up?
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB