Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Collation order of items


Copy link to this message
-
Re: Collation order of items
Storing the bytes as native UTF-16 or UTF-32 will not help.  Even
strings in UTF-8 format can be sorted by their code points when stored
as bytes.  Unfortunately, that's not really useful for collation as
characters like "è" (U+00E8) should appear between "e" (U+0065) and
"f" (U+0066), but the code points to not allow this.

Thanks anyway!

--Tom

On Fri, Jun 8, 2012 at 11:14 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Fri, Jun 8, 2012 at 9:35 AM, Tom Brown <[EMAIL PROTECTED]> wrote:
>> Is there any way to control introduce a different ordering scheme from
>> the base comparable bytes?  My use case is that I am using UTF-8 data
>> for my keys, and I would like to have scans use UTF-8 collation.
>>
>> Could this be done by providing an alternate implementation of
>> WritableComparable<ImmutableBytesWritable>?
>>
>> Thanks in advance!
>>
>
> Unfortunately no Tom.  The database is all sorted the same way.
> Different sorts per table would complicate system interactions (the
> catalog tables would have to change sort by table).  It might be
> doable but it would take some work.
>
> Can you store your data UTF-16 or UTF-32?  Its a while since I dealt
> w/ this stuff but IIRC, their sort order is byte order?  (WARNING!  I
> could be way off here).
>
> St.Ack