Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Full table scan from random starting point?

Robert Dyer 2014-01-31, 22:17
Copy link to this message
Re: Full table scan from random starting point?
Hi Robert,

You can randomly build your start key, give it to your scanner, scan until
the end of the table, then give it as the end key for a new scanner. Doing
that you will scan the way you are looking for.

Also, this might interest you:

2014-01-31 Robert Dyer <[EMAIL PROTECTED]>:

> Let's say I have one client on each of my regionservers.  Each client needs
> to do a full scan on the same table.  The order in which the rows are
> scanned by clients does not matter.
> Is it possible to have each client start at a random (or better, the first
> row located on the local rs) point in the table so that if I start all of
> them at once they don't all peg the same rs for reads?
> Example (to keep it simple, assume 3 RS):
> RS1: rows 1-2
> RS2: rows 3-4
> RS3: rows 5-6
> client1 (on RS1) reads rows: 1, 2, 3, 4, 5, 6
> client2 (on RS2) reads rows: 3, 4, 5, 6, 1, 2
> client3 (on RS3) reads rows: 5, 6, 1, 2, 3, 4
> Obviously they may progress at different rates and still wind up hitting
> the same RSs, but at least we can start out a bit more distributed.
> Is this easily possible, without first obtaining a list of all rows and
> manually batching them up?