Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Best way to query multiple sets of rows


Copy link to this message
-
Re: Best way to query multiple sets of rows
Hi Greame,
Are you familiar with Phoenix (https://github.com/forcedotcom/phoenix),
a SQL skin over HBase? We've just introduced a new feature (still in the
master branch) that'll do what you're looking for: transparently doing a
skip scan over the chunks of your HBase data based on your SQL query. It
leverages HBase's ability to have a filter return a "skip next" hint.
We've found it can make a pretty dramatic performance (50x), depending
on the cardinality of your data and the size of the chunks you're returning.

Thanks,
James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com/

On 04/08/2013 11:30 AM, Graeme Wallace wrote:
> I thought a Scan could only cope with one start row and an end row ?
>
>
> On Mon, Apr 8, 2013 at 1:27 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
>> wrote:
>> Hi Greame,
>>
>> The scans are the right way to do that.
>>
>> They will give you back all the data you need, chunck by chunk. Then
>> yoiu have to iterate over the data to do what you want with it.
>>
>> What was your expectation? I'm not sure I'm getting your "so that i
>> dont have to issue sequential Scans".
>>
>> jM
>>
>> 2013/4/8 Graeme Wallace <[EMAIL PROTECTED]>:
>>> Hi,
>>>
>>> Maybe there is an obvious way but i'm not seeing it.
>>>
>>> I have a need to query HBase for multiple chunks of data, that is
>> something
>>> equivalent to
>>>
>>> select columns
>>> from table
>>> where rowid between A and B
>>> or rowid between C and D
>>> or rowid between E and F
>>> etc.
>>>
>>> in SQL.
>>>
>>> Whats the best way to go about doing this so that i dont have to issue
>>> sequential Scans ?
>>>
>>> --
>>> Graeme Wallace
>>> CTO
>>> FareCompare.com
>>> O: 972 588 1414
>>> M: 214 681 9018
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB