|
Mike Hugo
2013-02-27, 04:51
Christopher
2013-02-27, 08:03
Jared Winick
2013-02-27, 15:30
Adam Fuchs
2013-02-27, 15:44
Keith Turner
2013-02-27, 15:50
Mike Hugo
2013-02-27, 16:57
|
-
Suggestions on modeling a composite row keyMike Hugo 2013-02-27, 04:51
I need to build up a row key that consists of two parts, the first being a
URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a number (e.g. "12"). To date we've been using \u0000 to delimit these two pieces of the key, but that has some headaches associated with it. I'm curious to know how other people have delimited composite row keys. Any best practices or suggestions? Thanks, Mike
-
Re: Suggestions on modeling a composite row keyChristopher 2013-02-27, 08:03
Check out Typo: https://github.com/keith-turner/typo
What you're describing is the motivation for that little utility API. Alternatively, if you don't care about the overhead costs or human readability, you could use a modified base64 encoding of your binary key components that preserves the ordering (such as http://iharder.sourceforge.net/current/java/base64/ which I found with Google just now), encode them individually, and join them using a delimiter of your choosing (so long as your delimiter is lexicographically ordered prior to all the bytes in the output bytes of your order-preserving encoding). -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Feb 26, 2013 at 8:51 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: > I need to build up a row key that consists of two parts, the first being a > URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a number > (e.g. "12"). > > To date we've been using \u0000 to delimit these two pieces of the key, but > that has some headaches associated with it. > > I'm curious to know how other people have delimited composite row keys. Any > best practices or suggestions? > > Thanks, > > Mike
-
Re: Suggestions on modeling a composite row keyJared Winick 2013-02-27, 15:30
And if you weren't already aware, if you do something like Christopher
mentions, or anything that makes your Keys less than human friendly, check out the Formatter interface http://accumulo.apache.org/1.4/apidocs/org/apache/accumulo/core/util/format/Formatter.html. This will let you write a Formatter to turn the keys back into a human readable format in the shell (type "formatter --help" in the shell for more info). On Wed, Feb 27, 2013 at 1:03 AM, Christopher <[EMAIL PROTECTED]> wrote: > Check out Typo: https://github.com/keith-turner/typo > What you're describing is the motivation for that little utility API. > > Alternatively, if you don't care about the overhead costs or human > readability, you could use a modified base64 encoding of your binary > key components that preserves the ordering (such as > http://iharder.sourceforge.net/current/java/base64/ which I found with > Google just now), encode them individually, and join them using a > delimiter of your choosing (so long as your delimiter is > lexicographically ordered prior to all the bytes in the output bytes > of your order-preserving encoding). > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > > On Tue, Feb 26, 2013 at 8:51 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: > > I need to build up a row key that consists of two parts, the first being > a > > URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a > number > > (e.g. "12"). > > > > To date we've been using \u0000 to delimit these two pieces of the key, > but > > that has some headaches associated with it. > > > > I'm curious to know how other people have delimited composite row keys. > Any > > best practices or suggestions? > > > > Thanks, > > > > Mike >
-
Re: Suggestions on modeling a composite row keyAdam Fuchs 2013-02-27, 15:44
At sqrrl, we tend to use a Tuple class that implements List<String>
(List<ByteBuffer> would also work), and has conversions to and from ByteBuffer. To encode the tuple into a byte buffer, change all the "\1"s to "\1\2", change all the "\0"s to "\1\1", and put a "\0" byte between elements. "\1" is used as an escape character for all of the "\1"s and "\0"s appearing in the the unencoded form. To decode, just split on "\0" and reverse the escaping. This encoding preserves hierarchical, lexicographical ordering of tuple elements. Cheers, Adam On Tue, Feb 26, 2013 at 11:51 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: > I need to build up a row key that consists of two parts, the first being a > URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a > number (e.g. "12"). > > To date we've been using \u0000 to delimit these two pieces of the key, > but that has some headaches associated with it. > > I'm curious to know how other people have delimited composite row keys. > Any best practices or suggestions? > > Thanks, > > Mike >
-
Re: Suggestions on modeling a composite row keyKeith Turner 2013-02-27, 15:50
On Wed, Feb 27, 2013 at 3:03 AM, Christopher <[EMAIL PROTECTED]> wrote:
> Check out Typo: https://github.com/keith-turner/typo > What you're describing is the motivation for that little utility API. Also, you do not have to use the Typo API. You could use the Lexicoders that you need inorder to encode things so that they sort properly lexicographically. https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/typo/encoders/Lexicoder.java https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/typo/encoders/PairLexicoder.java > > Alternatively, if you don't care about the overhead costs or human > readability, you could use a modified base64 encoding of your binary > key components that preserves the ordering (such as > http://iharder.sourceforge.net/current/java/base64/ which I found with > Google just now), encode them individually, and join them using a > delimiter of your choosing (so long as your delimiter is > lexicographically ordered prior to all the bytes in the output bytes > of your order-preserving encoding). > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > > On Tue, Feb 26, 2013 at 8:51 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: >> I need to build up a row key that consists of two parts, the first being a >> URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a number >> (e.g. "12"). >> >> To date we've been using \u0000 to delimit these two pieces of the key, but >> that has some headaches associated with it. >> >> I'm curious to know how other people have delimited composite row keys. Any >> best practices or suggestions? >> >> Thanks, >> >> Mike
-
Re: Suggestions on modeling a composite row keyMike Hugo 2013-02-27, 16:57
Excellent, thanks everyone for all the suggestions!
Mike On Wed, Feb 27, 2013 at 9:50 AM, Keith Turner <[EMAIL PROTECTED]> wrote: > On Wed, Feb 27, 2013 at 3:03 AM, Christopher <[EMAIL PROTECTED]> wrote: > > Check out Typo: https://github.com/keith-turner/typo > > What you're describing is the motivation for that little utility API. > > Also, you do not have to use the Typo API. You could use the > Lexicoders that you need inorder to encode things so that they sort > properly lexicographically. > > > https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/typo/encoders/Lexicoder.java > > https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/typo/encoders/PairLexicoder.java > > > > > > Alternatively, if you don't care about the overhead costs or human > > readability, you could use a modified base64 encoding of your binary > > key components that preserves the ordering (such as > > http://iharder.sourceforge.net/current/java/base64/ which I found with > > Google just now), encode them individually, and join them using a > > delimiter of your choosing (so long as your delimiter is > > lexicographically ordered prior to all the bytes in the output bytes > > of your order-preserving encoding). > > > > -- > > Christopher L Tubbs II > > http://gravatar.com/ctubbsii > > > > > > On Tue, Feb 26, 2013 at 8:51 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: > >> I need to build up a row key that consists of two parts, the first > being a > >> URL (e.g. http://foo.com/dir/page%20name.htm) and the second being a > number > >> (e.g. "12"). > >> > >> To date we've been using \u0000 to delimit these two pieces of the key, > but > >> that has some headaches associated with it. > >> > >> I'm curious to know how other people have delimited composite row keys. > Any > >> best practices or suggestions? > >> > >> Thanks, > >> > >> Mike > |