Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Suggestions on modeling a composite row key


Copy link to this message
-
Re: Suggestions on modeling a composite row key
On Wed, Feb 27, 2013 at 3:03 AM, Christopher <[EMAIL PROTECTED]> wrote:
> Check out Typo: https://github.com/keith-turner/typo
> What you're describing is the motivation for that little utility API.

Also, you do not have to use the Typo API.  You could use the
Lexicoders that you need inorder to encode things so that they sort
properly lexicographically.

https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/typo/encoders/Lexicoder.java
https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/typo/encoders/PairLexicoder.java
>
> Alternatively, if you don't care about the overhead costs or human
> readability, you could use a modified base64 encoding of your binary
> key components that preserves the ordering (such as
> http://iharder.sourceforge.net/current/java/base64/ which I found with
> Google just now), encode them individually, and join them using a
> delimiter of your choosing (so long as your delimiter is
> lexicographically ordered prior to all the bytes in the output bytes
> of your order-preserving encoding).
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, Feb 26, 2013 at 8:51 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>> I need to build up a row key that consists of two parts, the first being a
>> URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a number
>> (e.g. "12").
>>
>> To date we've been using \u0000 to delimit these two pieces of the key, but
>> that has some headaches associated with it.
>>
>> I'm curious to know how other people have delimited composite row keys.  Any
>> best practices or suggestions?
>>
>> Thanks,
>>
>> Mike