Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Best way to query multiple sets of rows


Copy link to this message
-
Re: Best way to query multiple sets of rows
We've had some discussions about turning a set of Gets into (smaller set of Scans). That is only partially applicable here, though.

In your case I think you have two options:
1. Fire off multiple scans. You can do that in parallel from the client. Each one will hone in to the start row with only a single seek.
2. Use a custom filter to do a skip scan. You'd pass that start/end keys to your filter and after each slice of rows provide a seek hint to the next slice. That way you can handle this with only a single scan, and just as many (initial) seeks needed as your number of slices.
#1 seems to be a fine option.
As James pointed out, Phoenix does this for you already (including multiple scans and the skip scan logic, whichever makes more sense in the situation).

-- Lars

________________________________
 From: Graeme Wallace <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, April 8, 2013 11:23 AM
Subject: Best way to query multiple sets of rows
 
Hi,

Maybe there is an obvious way but i'm not seeing it.

I have a need to query HBase for multiple chunks of data, that is something
equivalent to

select columns
from table
where rowid between A and B
or rowid between C and D
or rowid between E and F
etc.

in SQL.

Whats the best way to go about doing this so that i dont have to issue
sequential Scans ?

--
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB