Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scanning half a key or value in HBase


Copy link to this message
-
Re: Scanning half a key or value in HBase
Really, solution will be (if I understand correctly), to set start key
to "123".getBytes(). and stop key set to "124".getBytes(). (stop key
can be produced by incrementing last byte in start key).

And we get exact all keys you mentioned before. really. because if we
sort keys by Bytes.BYTES_COMPARATOR we get what we want:

keys: "123foo" "123_foo", "123_foo3"

["1", "2", "2", ..... any other byte sequence of any length ! less
then start key
["1", "2", "3"] <= start key (exactly 3 bytes is always greater
(lexicographical byte order) then any
                           prefixed with 122, and always less then any
key prefixed with 123)
["1", "2", "3", "_", "f", "o", "o"]
["1", "2", "3", "_", "f", "o", "o", "3"]
["1", "2", "4"] <= stop key! Similar to start key. We always get stop
scanning if we get 124... as prefix.

So, you can scan without of any knowledge of exact start/stop keys.
In case of separator, start key will be 123_ and stop key will be last byte +1
_before separator_.

Try to experiment with Bytes.BYTES_COMPARATOR.

With scan like above you can get all rows very fast (because they will
be with height probability
in one block). Don't forget to set caching on scan to higher values
(100?), because by default
you get only one row in one round trip to server.

2010/8/23 Michelan Arendse <[EMAIL PROTECTED]>:
> Hi,
>
> Thanks for the responses but it's still not what I am really looking for.
>
> The row id looks something like: number_string so it would be 123_foo, 123_foo2 123_foo3.
> So now I want to find all the foo's that are related to the first half of the key which is "123".
>
> Also I can't add start row if I do not know where 123 starts. And I can't search for the start row, as I need this to be very fast.
>
> Thanks.
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:[EMAIL PROTECTED]]
> Sent: 17 August 2010 09:01 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scanning half a key or value in HBase
>
> Hey,
>
> One thing to watch out for is ascii with separator variable length
> keys, you would think if your key structure was:
>
> foo:bar
>
> starting at 'foo' and ending at 'foo:' might give you only keys which
> start with 'foo:' but this doesn't work like that.  You also get keys
> like:
> foo123:bar
>
> you must start the scan at 'foo:' but you can't just end it at 'foo;'
> (since next(:) == ';' in ascii), this has to do with the ordering of
> ASCII, for a reference look at http://www.asciitable.com/
>
> The bug-free solution is to start your scan at 'foo:' and use a prefix
> filter set to 'foo:'.
>
> If you are scanning fixed-width keys, eg: binary conversions of longs,
> then the [start,start+1) solution works.
>
> On Tue, Aug 17, 2010 at 5:59 AM, Andrey Stepachev <[EMAIL PROTECTED]> wrote:
>> Use scan where start key is <first_half_of_key> itself as bytearray, and
>> stop key is <first_half_of_key> with last byte in bytearray + 1.
>>
>> example
>> abc% should be scan(abc, abd)
>>
>> 2010/8/17 Michelan Arendse <[EMAIL PROTECTED]>:
>>> Hi
>>>
>>> I am not sure if this is possible in HBase. What I am trying to do is scan on a HBase table with something similar to how SQL would do it.
>>> e.g. SELECT *
>>>         FROM <table>
>>>         WHERE <primary key> LIKE '<first_half_of_key>%' ;
>>>
>>> So as you can see from above I want to scan the table with only part of the row key, since the key is a combination of 2 fields in the table.
>>>
>>> Regards,
>>> Michelan Arendse
>>>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB