On Wed, Apr 27, 2011 at 11:00 AM, Joe Pallas <[EMAIL PROTECTED]> wrote:
> On Apr 26, 2011, at 11:54 PM, Himanshu Vashishtha wrote:
> > HBase uses utf-8 encoding to store the row keys, so it can store
> > characters too (yes they will be larger than 1 byte).
> That statement may be misleading. HBase doesn't use any encoding at all,
> because row keys are simply arrays of bytes. HBase cares only about the
> sorting order of those byte arrays, and neither knows nor cares what
> interpretation the client may attach to them.
> What I meant was for String like "façade" or "fad", it uses utf-8 encoding
> scheme to create those byte arrays (and therefore you can store non ascii
> values too, though they will vary from 1-4 bytes in size but as an end user,
> you don't care about that).
> The UTF-8 standard mentions that the byte-value lexicographic sorting order
> of UTF-8 strings matches the sorting order of the Unicode character numbers,
> so a client can turn 16- or 32-bit Unicode strings into UTF-8 in order to
> use them as keys and they will sort the same way. (Although the standard
> warns that "a sort order based on character numbers is almost never
> culturally valid.")
> On the plus side, that means you never have to worry about "What's the next
> character after ç?" Just add 1. But don't be surprised when "fad" comes
> before "façade" in your sort.
> yes, no need to do any hard coding. Just add 1 to the last byte of the byte
array that is formed from the prefix of the key that you want to search.
Hope this is not that confusing now. :)