Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Best way to query multiple sets of rows


+
Graeme Wallace 2013-04-08, 18:23
+
Jean-Marc Spaggiari 2013-04-08, 18:27
+
Graeme Wallace 2013-04-08, 18:30
+
Jean-Marc Spaggiari 2013-04-08, 18:36
+
Graeme Wallace 2013-04-08, 18:39
+
Ted Yu 2013-04-08, 18:39
+
Graeme Wallace 2013-04-08, 19:10
Copy link to this message
-
Re: Best way to query multiple sets of rows
Hi Graeme,

Each time filterRowKey will return true, the entire row will be
skipped, so the data related to this row will not be read. However,
there might still be some disk access if everything is not in memory,
but not more than if you are doing a "regular" scan without any
filter.

I still think that calling the 3 scan in a raw without any filter will
be faster than using the filter since there will be less operations.
But both options might work.

JMS

2013/4/8 Graeme Wallace <[EMAIL PROTECTED]>:
> Everyone - thanks for the replies.
>
> I have a followup question on Filters.
>
> boolean filterRowKey(byte [] buffer, int offset, int length)
>
> If i implement this to decide to include or exclude a row based upon my
> sets of rowkey pairs.
>
> How much I/O is involved to disk on each region server ? Will it just read
> row keys (hopefully from cache) until i say i need a row, then read the
> KeyValues for the columns i want and then pass into filterKeyValue() ?
>
> Is that the most efficient way of doing it ? I dont see a way of hinting
> for the next row i'm interested in (I'm assuming row keys are ordered ??),
> so does that mean for each region all the row keys are passed into the
> filter ?
>
>
>
> On Mon, Apr 8, 2013 at 1:39 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> For Scan:
>>
>>  * To add a filter, execute {@link
>> #setFilter(org.apache.hadoop.hbase.filter.Filter) setFilter}.
>>
>> Take a look at RowFilter:
>>
>>  * This filter is used to filter based on the key. It takes an operator
>>
>>  * (equal, greater, not equal, etc) and a byte [] comparator for the row,
>>
>> You can enhance RowFilter so that you may specify the pair(s) of start and
>> end rows.
>>
>> Cheers
>>
>> On Mon, Apr 8, 2013 at 11:30 AM, Graeme Wallace <
>> [EMAIL PROTECTED]> wrote:
>>
>> > I thought a Scan could only cope with one start row and an end row ?
>> >
>> >
>> > On Mon, Apr 8, 2013 at 1:27 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]
>> > > wrote:
>> >
>> > > Hi Greame,
>> > >
>> > > The scans are the right way to do that.
>> > >
>> > > They will give you back all the data you need, chunck by chunk. Then
>> > > yoiu have to iterate over the data to do what you want with it.
>> > >
>> > > What was your expectation? I'm not sure I'm getting your "so that i
>> > > dont have to issue sequential Scans".
>> > >
>> > > jM
>> > >
>> > > 2013/4/8 Graeme Wallace <[EMAIL PROTECTED]>:
>> > > > Hi,
>> > > >
>> > > > Maybe there is an obvious way but i'm not seeing it.
>> > > >
>> > > > I have a need to query HBase for multiple chunks of data, that is
>> > > something
>> > > > equivalent to
>> > > >
>> > > > select columns
>> > > > from table
>> > > > where rowid between A and B
>> > > > or rowid between C and D
>> > > > or rowid between E and F
>> > > > etc.
>> > > >
>> > > > in SQL.
>> > > >
>> > > > Whats the best way to go about doing this so that i dont have to
>> issue
>> > > > sequential Scans ?
>> > > >
>> > > > --
>> > > > Graeme Wallace
>> > > > CTO
>> > > > FareCompare.com
>> > > > O: 972 588 1414
>> > > > M: 214 681 9018
>> > >
>> >
>> >
>> >
>> > --
>> > Graeme Wallace
>> > CTO
>> > FareCompare.com
>> > O: 972 588 1414
>> > M: 214 681 9018
>> >
>>
>
>
>
> --
> Graeme Wallace
> CTO
> FareCompare.com
> O: 972 588 1414
> M: 214 681 9018
+
Ted Yu 2013-04-08, 20:55
+
James Taylor 2013-04-08, 18:39
+
Shixiaolong 2013-04-09, 03:00
+
lars hofhansl 2013-04-08, 21:37
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB