Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Best way to query multiple sets of rows


Copy link to this message
-
Re: Best way to query multiple sets of rows
lars hofhansl 2013-04-08, 21:37
We've had some discussions about turning a set of Gets into (smaller set of Scans). That is only partially applicable here, though.

In your case I think you have two options:
1. Fire off multiple scans. You can do that in parallel from the client. Each one will hone in to the start row with only a single seek.
2. Use a custom filter to do a skip scan. You'd pass that start/end keys to your filter and after each slice of rows provide a seek hint to the next slice. That way you can handle this with only a single scan, and just as many (initial) seeks needed as your number of slices.
#1 seems to be a fine option.
As James pointed out, Phoenix does this for you already (including multiple scans and the skip scan logic, whichever makes more sense in the situation).

-- Lars

________________________________
 From: Graeme Wallace <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, April 8, 2013 11:23 AM
Subject: Best way to query multiple sets of rows
 
Hi,

Maybe there is an obvious way but i'm not seeing it.

I have a need to query HBase for multiple chunks of data, that is something
equivalent to

select columns
from table
where rowid between A and B
or rowid between C and D
or rowid between E and F
etc.

in SQL.

Whats the best way to go about doing this so that i dont have to issue
sequential Scans ?

--
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018