Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to scan rows starting with a particular string?


Copy link to this message
-
Re: How to scan rows starting with a particular string?
On Wed, Apr 27, 2011 at 11:00 AM, Joe Pallas <[EMAIL PROTECTED]> wrote:

>
> On Apr 26, 2011, at 11:54 PM, Himanshu Vashishtha wrote:
>
> > HBase uses utf-8 encoding to store the row keys, so it can store
> non-ascii
> > characters too (yes they will be larger than 1 byte).
>
> That statement may be misleading.  HBase doesn't use any encoding at all,
> because row keys are simply arrays of bytes.  HBase cares only about the
> sorting order of those byte arrays, and neither knows nor cares what
> interpretation the client may attach to them.
> What I meant was for String like "façade" or "fad", it uses utf-8 encoding
> scheme to create those byte arrays (and therefore you can store non ascii
> values too, though they will vary from 1-4 bytes in size but as an end user,
> you don't care about that).
>
> The UTF-8 standard mentions that the byte-value lexicographic sorting order
> of UTF-8 strings matches the sorting order of the Unicode character numbers,
> so a client can turn 16- or 32-bit Unicode strings into UTF-8 in order to
> use them as keys and they will sort the same way.  (Although the standard
> warns that "a sort order based on character numbers is almost never
> culturally valid.")
>
> On the plus side, that means you never have to worry about "What's the next
> character after ç?"  Just add 1.  But don't be surprised when "fad" comes
> before "façade" in your sort.
>
> yes, no need to do any hard coding. Just add 1 to the last byte of the byte
array that is formed from the prefix of the key that you want to search.

Hope this is not that confusing now. :)

> joe
>
>