Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Full table scan from random starting point?

Robert Dyer 2014-01-31, 22:17
Copy link to this message
Re: Full table scan from random starting point?
Jean-Marc Spaggiari 2014-02-01, 01:47
Hi Robert,

You can randomly build your start key, give it to your scanner, scan until
the end of the table, then give it as the end key for a new scanner. Doing
that you will scan the way you are looking for.

Also, this might interest you:

2014-01-31 Robert Dyer <[EMAIL PROTECTED]>:

> Let's say I have one client on each of my regionservers.  Each client needs
> to do a full scan on the same table.  The order in which the rows are
> scanned by clients does not matter.
> Is it possible to have each client start at a random (or better, the first
> row located on the local rs) point in the table so that if I start all of
> them at once they don't all peg the same rs for reads?
> Example (to keep it simple, assume 3 RS):
> RS1: rows 1-2
> RS2: rows 3-4
> RS3: rows 5-6
> client1 (on RS1) reads rows: 1, 2, 3, 4, 5, 6
> client2 (on RS2) reads rows: 3, 4, 5, 6, 1, 2
> client3 (on RS3) reads rows: 5, 6, 1, 2, 3, 4
> Obviously they may progress at different rates and still wind up hitting
> the same RSs, but at least we can start out a bit more distributed.
> Is this easily possible, without first obtaining a list of all rows and
> manually batching them up?