Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Best way to query multiple sets of rows

Copy link to this message
Re: Best way to query multiple sets of rows
Hi Greame,
Are you familiar with Phoenix (https://github.com/forcedotcom/phoenix),
a SQL skin over HBase? We've just introduced a new feature (still in the
master branch) that'll do what you're looking for: transparently doing a
skip scan over the chunks of your HBase data based on your SQL query. It
leverages HBase's ability to have a filter return a "skip next" hint.
We've found it can make a pretty dramatic performance (50x), depending
on the cardinality of your data and the size of the chunks you're returning.


On 04/08/2013 11:30 AM, Graeme Wallace wrote:
> I thought a Scan could only cope with one start row and an end row ?
> On Mon, Apr 8, 2013 at 1:27 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
>> wrote:
>> Hi Greame,
>> The scans are the right way to do that.
>> They will give you back all the data you need, chunck by chunk. Then
>> yoiu have to iterate over the data to do what you want with it.
>> What was your expectation? I'm not sure I'm getting your "so that i
>> dont have to issue sequential Scans".
>> jM
>> 2013/4/8 Graeme Wallace <[EMAIL PROTECTED]>:
>>> Hi,
>>> Maybe there is an obvious way but i'm not seeing it.
>>> I have a need to query HBase for multiple chunks of data, that is
>> something
>>> equivalent to
>>> select columns
>>> from table
>>> where rowid between A and B
>>> or rowid between C and D
>>> or rowid between E and F
>>> etc.
>>> in SQL.
>>> Whats the best way to go about doing this so that i dont have to issue
>>> sequential Scans ?
>>> --
>>> Graeme Wallace
>>> CTO
>>> FareCompare.com
>>> O: 972 588 1414
>>> M: 214 681 9018