Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Best way to query multiple sets of rows

Graeme Wallace 2013-04-08, 18:23
Jean-Marc Spaggiari 2013-04-08, 18:27
Graeme Wallace 2013-04-08, 18:30
Jean-Marc Spaggiari 2013-04-08, 18:36
Graeme Wallace 2013-04-08, 18:39
Ted Yu 2013-04-08, 18:39
Graeme Wallace 2013-04-08, 19:10
Jean-Marc Spaggiari 2013-04-08, 20:31
Ted Yu 2013-04-08, 20:55
James Taylor 2013-04-08, 18:39
Shixiaolong 2013-04-09, 03:00
Copy link to this message
Re: Best way to query multiple sets of rows
lars hofhansl 2013-04-08, 21:37
We've had some discussions about turning a set of Gets into (smaller set of Scans). That is only partially applicable here, though.

In your case I think you have two options:
1. Fire off multiple scans. You can do that in parallel from the client. Each one will hone in to the start row with only a single seek.
2. Use a custom filter to do a skip scan. You'd pass that start/end keys to your filter and after each slice of rows provide a seek hint to the next slice. That way you can handle this with only a single scan, and just as many (initial) seeks needed as your number of slices.
#1 seems to be a fine option.
As James pointed out, Phoenix does this for you already (including multiple scans and the skip scan logic, whichever makes more sense in the situation).

-- Lars

 From: Graeme Wallace <[EMAIL PROTECTED]>
Sent: Monday, April 8, 2013 11:23 AM
Subject: Best way to query multiple sets of rows

Maybe there is an obvious way but i'm not seeing it.

I have a need to query HBase for multiple chunks of data, that is something
equivalent to

select columns
from table
where rowid between A and B
or rowid between C and D
or rowid between E and F

in SQL.

Whats the best way to go about doing this so that i dont have to issue
sequential Scans ?

Graeme Wallace
O: 972 588 1414
M: 214 681 9018