Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Collation order of items


Copy link to this message
-
Re: Collation order of items
Storing the bytes as native UTF-16 or UTF-32 will not help.  Even
strings in UTF-8 format can be sorted by their code points when stored
as bytes.  Unfortunately, that's not really useful for collation as
characters like "è" (U+00E8) should appear between "e" (U+0065) and
"f" (U+0066), but the code points to not allow this.

Thanks anyway!

--Tom

On Fri, Jun 8, 2012 at 11:14 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Fri, Jun 8, 2012 at 9:35 AM, Tom Brown <[EMAIL PROTECTED]> wrote:
>> Is there any way to control introduce a different ordering scheme from
>> the base comparable bytes?  My use case is that I am using UTF-8 data
>> for my keys, and I would like to have scans use UTF-8 collation.
>>
>> Could this be done by providing an alternate implementation of
>> WritableComparable<ImmutableBytesWritable>?
>>
>> Thanks in advance!
>>
>
> Unfortunately no Tom.  The database is all sorted the same way.
> Different sorts per table would complicate system interactions (the
> catalog tables would have to change sort by table).  It might be
> doable but it would take some work.
>
> Can you store your data UTF-16 or UTF-32?  Its a while since I dealt
> w/ this stuff but IIRC, their sort order is byte order?  (WARNING!  I
> could be way off here).
>
> St.Ack
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB