|
|
Keith Turner 2012-08-11, 00:07
I put together a simple abstraction layer for Accumulo that makes it easier to read and write Java objects to Accumulo key and value fields. The data written to Accumulo sort correctly lexicographically. I put the code on github and would like some feedback on the design and whether it should be included with Accumulo. https://github.com/keith-turner/typoIts still a little rough and I need to add encoder for all of the primitive types. Keith
+
Keith Turner 2012-08-11, 00:07
Ed Kohlwey 2012-08-13, 00:11
I really like this. I've thought for some time that something of this sort should be part of the Accumulo core API. The inconsistent use use CharSequence, String, Text, and byte[] objects to represent the n-tuples gets very old very quickly, and distracts programmers in a multitude of ways. The current API should really be refactored to make CharSequence, Text, byte[], and ByteBuffer types available for setting the contents of Key and Value types just for consistency's sake. It would be nice to have something like this added as well. It would be good to see this package use nio-ish strategies to reduce the load on the garbage collector, by using buffer classes instead of arrays. But otherwise the design looks solid. On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <[EMAIL PROTECTED]> wrote: > I put together a simple abstraction layer for Accumulo that makes it > easier to read and write Java objects to Accumulo key and value > fields. The data written to Accumulo sort correctly > lexicographically. > > I put the code on github and would like some feedback on the design > and whether it should be included with Accumulo. > > https://github.com/keith-turner/typo> > Its still a little rough and I need to add encoder for all of the > primitive types. > > Keith >
+
Ed Kohlwey 2012-08-13, 00:11
Keith Turner 2012-08-13, 15:02
On Sun, Aug 12, 2012 at 8:11 PM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > I really like this. I've thought for some time that something of this sort > should be part of the Accumulo core API. The inconsistent use use > CharSequence, String, Text, and byte[] objects to represent the n-tuples > gets very old very quickly, and distracts programmers in a multitude of > ways. > > The current API should really be refactored to make CharSequence, Text, > byte[], and ByteBuffer types available for setting the contents of Key and > Value types just for consistency's sake. It would be nice to have something > like this added as well. I made TypoMutation extend Mutation. I wanted to make TypoKey extend Key, but could not because the return types conflicted. This could be done, but getRow(),etc would need different names. > > It would be good to see this package use nio-ish strategies to reduce the > load on the garbage collector, by using buffer classes instead of arrays. > But otherwise the design looks solid. I would definitely like to avoid allocations if possible, I will look into that. > > On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <[EMAIL PROTECTED]> wrote: > >> I put together a simple abstraction layer for Accumulo that makes it >> easier to read and write Java objects to Accumulo key and value >> fields. The data written to Accumulo sort correctly >> lexicographically. >> >> I put the code on github and would like some feedback on the design >> and whether it should be included with Accumulo. >> >> https://github.com/keith-turner/typo>> >> Its still a little rough and I need to add encoder for all of the >> primitive types. >> >> Keith >>
+
Keith Turner 2012-08-13, 15:02
Josh Elser 2012-08-13, 01:36
Neat idea, Keith. Have you thought about how to support more complex types? Specifically, arrays, hashes and the nesting of those? Any thoughts about indexing for those complex types? Initial thoughts are that it would make the most sense to place Typo at the contrib level (or something equivalent). The reason being: Typo doesn't change the underlying functionality of Accumulo; it only provides a layer on top of it that makes life easier for developers. On 08/10/2012 07:07 PM, Keith Turner wrote: > I put together a simple abstraction layer for Accumulo that makes it > easier to read and write Java objects to Accumulo key and value > fields. The data written to Accumulo sort correctly > lexicographically. > > I put the code on github and would like some feedback on the design > and whether it should be included with Accumulo. > > https://github.com/keith-turner/typo> > Its still a little rough and I need to add encoder for all of the > primitive types. > > Keith
+
Josh Elser 2012-08-13, 01:36
Keith Turner 2012-08-13, 16:06
On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser <[EMAIL PROTECTED]> wrote: > Neat idea, Keith. > > Have you thought about how to support more complex types? Specifically, > arrays, hashes and the nesting of those? Any thoughts about indexing for > those complex types? Yeah I was thinking that would be nice. I see a lot of users putting multiple types into the row and/or columns. Could have something like TupleEncoder<List<A>>. TupleEncoder would need to encode it elements such that it sorts correctly. However, this may be cumbersome to use if you want to use different types. For example I want a row composed of a Long and String. I was thinking of having the following types to handle this case. class Pair<A,B> extends LexEncoder{ Pair(LexEncoder<A> enc1, LexEncoder<B> enc2); A getFirst(){} B getSecond(){} } class Triple<A,B,C>{//follows same pattern as Pair} class Quadruple<A,B,C,D>{//follows same pattern as Pair} This would allow a user to write code like the following that makes it easy to work with a row composed of a Long and String. Pair<Long, String> pair; long l = pair.getFirst(); String s = pair.getSecond(); I am still thinking the tuple concept through. I was not considering indexing. I assuming you mean creating an index in another table? > > Initial thoughts are that it would make the most sense to place Typo at the > contrib level (or something equivalent). The reason being: Typo doesn't > change the underlying functionality of Accumulo; it only provides a layer on > top of it that makes life easier for developers. I think putting it in contrib makes sense. > > > On 08/10/2012 07:07 PM, Keith Turner wrote: >> >> I put together a simple abstraction layer for Accumulo that makes it >> easier to read and write Java objects to Accumulo key and value >> fields. The data written to Accumulo sort correctly >> lexicographically. >> >> I put the code on github and would like some feedback on the design >> and whether it should be included with Accumulo. >> >> https://github.com/keith-turner/typo>> >> Its still a little rough and I need to add encoder for all of the >> primitive types. >> >> Keith
+
Keith Turner 2012-08-13, 16:06
Josh Elser 2012-08-13, 22:03
Even with something as simple as a pair, things can start getting difficult. I suppose it really revolves around the level of support you want to provide at scan time, e.g. "find all pairs where the second is 'x'?". Spending a few minutes thinking about it, an index could be a separate table but wouldn't necessarily have to be. It depends on the complexity of the structure you're trying to index. Using the Pair example again, you could reserve a column (family) to place index records in which simply inverts the Pair in the colqual. On 08/13/2012 11:06 AM, Keith Turner wrote: > On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<[EMAIL PROTECTED]> wrote: >> Neat idea, Keith. >> >> Have you thought about how to support more complex types? Specifically, >> arrays, hashes and the nesting of those? Any thoughts about indexing for >> those complex types? > Yeah I was thinking that would be nice. I see a lot of users putting > multiple types into the row and/or columns. Could have something like > TupleEncoder<List<A>>. TupleEncoder would need to encode it elements > such that it sorts correctly. However, this may be cumbersome to use > if you want to use different types. For example I want a row composed > of a Long and String. I was thinking of having the following types to > handle this case. > > class Pair<A,B> extends LexEncoder{ > Pair(LexEncoder<A> enc1, LexEncoder<B> enc2); > A getFirst(){} > B getSecond(){} > } > > class Triple<A,B,C>{//follows same pattern as Pair} > class Quadruple<A,B,C,D>{//follows same pattern as Pair} > > This would allow a user to write code like the following that makes it > easy to work with a row composed of a Long and String. > > Pair<Long, String> pair; > long l = pair.getFirst(); > String s = pair.getSecond(); > > I am still thinking the tuple concept through. > > I was not considering indexing. I assuming you mean creating an index > in another table? > >> Initial thoughts are that it would make the most sense to place Typo at the >> contrib level (or something equivalent). The reason being: Typo doesn't >> change the underlying functionality of Accumulo; it only provides a layer on >> top of it that makes life easier for developers. > I think putting it in contrib makes sense. > >> >> On 08/10/2012 07:07 PM, Keith Turner wrote: >>> I put together a simple abstraction layer for Accumulo that makes it >>> easier to read and write Java objects to Accumulo key and value >>> fields. The data written to Accumulo sort correctly >>> lexicographically. >>> >>> I put the code on github and would like some feedback on the design >>> and whether it should be included with Accumulo. >>> >>> https://github.com/keith-turner/typo>>> >>> Its still a little rough and I need to add encoder for all of the >>> primitive types. >>> >>> Keith
+
Josh Elser 2012-08-13, 22:03
Christopher Tubbs 2012-08-13, 21:12
Am I right in assuming that this is about simplifying the API for storing typed data in the key, and not about providing a mechanism for query. Isn't this really just about storing stuff you've already decided was a good structure for whatever your query mechanism is? On Mon, Aug 13, 2012 at 6:03 PM, Josh Elser <[EMAIL PROTECTED]> wrote: > Even with something as simple as a pair, things can start getting difficult. > I suppose it really revolves around the level of support you want to provide > at scan time, e.g. "find all pairs where the second is 'x'?". > > Spending a few minutes thinking about it, an index could be a separate table > but wouldn't necessarily have to be. It depends on the complexity of the > structure you're trying to index. Using the Pair example again, you could > reserve a column (family) to place index records in which simply inverts the > Pair in the colqual. > > > On 08/13/2012 11:06 AM, Keith Turner wrote: >> >> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<[EMAIL PROTECTED]> wrote: >>> >>> Neat idea, Keith. >>> >>> Have you thought about how to support more complex types? Specifically, >>> arrays, hashes and the nesting of those? Any thoughts about indexing for >>> those complex types? >> >> Yeah I was thinking that would be nice. I see a lot of users putting >> multiple types into the row and/or columns. Could have something like >> TupleEncoder<List<A>>. TupleEncoder would need to encode it elements >> such that it sorts correctly. However, this may be cumbersome to use >> if you want to use different types. For example I want a row composed >> of a Long and String. I was thinking of having the following types to >> handle this case. >> >> class Pair<A,B> extends LexEncoder{ >> Pair(LexEncoder<A> enc1, LexEncoder<B> enc2); >> A getFirst(){} >> B getSecond(){} >> } >> >> class Triple<A,B,C>{//follows same pattern as Pair} >> class Quadruple<A,B,C,D>{//follows same pattern as Pair} >> >> This would allow a user to write code like the following that makes it >> easy to work with a row composed of a Long and String. >> >> Pair<Long, String> pair; >> long l = pair.getFirst(); >> String s = pair.getSecond(); >> >> I am still thinking the tuple concept through. >> >> I was not considering indexing. I assuming you mean creating an index >> in another table? >> >>> Initial thoughts are that it would make the most sense to place Typo at >>> the >>> contrib level (or something equivalent). The reason being: Typo doesn't >>> change the underlying functionality of Accumulo; it only provides a layer >>> on >>> top of it that makes life easier for developers. >> >> I think putting it in contrib makes sense. >> >>> >>> On 08/10/2012 07:07 PM, Keith Turner wrote: >>>> >>>> I put together a simple abstraction layer for Accumulo that makes it >>>> easier to read and write Java objects to Accumulo key and value >>>> fields. The data written to Accumulo sort correctly >>>> lexicographically. >>>> >>>> I put the code on github and would like some feedback on the design >>>> and whether it should be included with Accumulo. >>>> >>>> https://github.com/keith-turner/typo>>>> >>>> Its still a little rough and I need to add encoder for all of the >>>> primitive types. >>>> >>>> Keith
+
Christopher Tubbs 2012-08-13, 21:12
Ed Kohlwey 2012-08-15, 13:19
I think its not just about types, but specifically primitive types and tuples.
So its avoiding being a full-fledged ORM solution like Gora. > Am I right in assuming that this is about simplifying the API for > storing typed data in the key, and not about providing a mechanism for > query. Isn't this really just about storing stuff you've already > decided was a good structure for whatever your query mechanism is?
+
Ed Kohlwey 2012-08-15, 13:19
Keith Turner 2012-08-15, 13:38
On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > I think its not just about types, but specifically primitive types and > tuples. Right. And sorting is another very important aspect. User want to do things like store dates that sort in reverese order as part of a tuple in the row. We tell them its possible if they encode their data in a certain way. And we also tell them "oh, BTW if you have binary data in your tuple it can be tricky to get it right". So one goal of Typo is to make this easier for users. I think something like the following would do this and get the lexicographic sort order correct.
class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> { public RDTypo() { super(new PairLexicoder<Long,Date>(new LongLexicoder(), new ReverseLexicoder<Date>(new DateLexicoder())), new StringLexicoder(), new StringLexicoder(), new TextLexicoder()); } }
I so wish that Java had typedef, it could make the Typo API much more concise. I never thought I would actually miss C++ template programming :) I still need to do some more research on Java generics to see if I can make things more concise.
> > So its avoiding being a full-fledged ORM solution like Gora. > > >> Am I right in assuming that this is about simplifying the API for >> storing typed data in the key, and not about providing a mechanism for >> query. Isn't this really just about storing stuff you've already >> decided was a good structure for whatever your query mechanism is?
+
Keith Turner 2012-08-15, 13:38
Marc Parisi 2012-08-15, 13:45
write an annotation called TYPEDEF that creates the source code for you at compilation. all you really need to do is extend the type to your defined name.
On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > > I think its not just about types, but specifically primitive types and > > tuples. > Right. And sorting is another very important aspect. User want to > do things like store dates that sort in reverese order as part of a > tuple in the row. We tell them its possible if they encode their data > in a certain way. And we also tell them "oh, BTW if you have binary > data in your tuple it can be tricky to get it right". So one goal of > Typo is to make this easier for users. I think something like the > following would do this and get the lexicographic sort order correct. > > class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> { > public RDTypo() { > super(new PairLexicoder<Long,Date>(new LongLexicoder(), new > ReverseLexicoder<Date>(new DateLexicoder())), > new StringLexicoder(), new StringLexicoder(), new > TextLexicoder()); > } > } > > I so wish that Java had typedef, it could make the Typo API much more > concise. I never thought I would actually miss C++ template > programming :) I still need to do some more research on Java generics > to see if I can make things more concise. > > > > > So its avoiding being a full-fledged ORM solution like Gora. > > > > > >> Am I right in assuming that this is about simplifying the API for > >> storing typed data in the key, and not about providing a mechanism for > >> query. Isn't this really just about storing stuff you've already > >> decided was a good structure for whatever your query mechanism is? >
+
Marc Parisi 2012-08-15, 13:45
Ed Kohlwey 2012-08-15, 14:09
One suggestion I'd make is to force users to name their tuples by making the tuple types abstract. This won't help your complexity but IMHO makes code more readable.
This an issue of java style, but there's nothing more irritating than tuples floating around code without having an obvious explanation of "why do these things belong together"?
On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > > I think its not just about types, but specifically primitive types and > > tuples. > Right. And sorting is another very important aspect. User want to > do things like store dates that sort in reverese order as part of a > tuple in the row. We tell them its possible if they encode their data > in a certain way. And we also tell them "oh, BTW if you have binary > data in your tuple it can be tricky to get it right". So one goal of > Typo is to make this easier for users. I think something like the > following would do this and get the lexicographic sort order correct. > > class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> { > public RDTypo() { > super(new PairLexicoder<Long,Date>(new LongLexicoder(), new > ReverseLexicoder<Date>(new DateLexicoder())), > new StringLexicoder(), new StringLexicoder(), new > TextLexicoder()); > } > } > > I so wish that Java had typedef, it could make the Typo API much more > concise. I never thought I would actually miss C++ template > programming :) I still need to do some more research on Java generics > to see if I can make things more concise. > > > > > So its avoiding being a full-fledged ORM solution like Gora. > > > > > >> Am I right in assuming that this is about simplifying the API for > >> storing typed data in the key, and not about providing a mechanism for > >> query. Isn't this really just about storing stuff you've already > >> decided was a good structure for whatever your query mechanism is? >
+
Ed Kohlwey 2012-08-15, 14:09
Keith Turner 2012-08-15, 16:50
On Wed, Aug 15, 2012 at 10:09 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > One suggestion I'd make is to force users to name their tuples by making > the tuple types abstract. This won't help your complexity but IMHO makes > code more readable.
Thats a good suggestion, I made Typo abstract. I also made class like TypoScanner, TypoMutation, etc inner classes of Typo. Doing this I was able to achieve what I wanted to with typedef, making code that uses Typo more concise. The inner classes and type parameters actually work the way I want, I was not sure it would before I tried it.
> > This an issue of java style, but there's nothing more irritating than > tuples floating around code without having an obvious explanation of "why > do these things belong together"? > > On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <[EMAIL PROTECTED]> wrote: > >> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: >> > I think its not just about types, but specifically primitive types and >> > tuples. >> Right. And sorting is another very important aspect. User want to >> do things like store dates that sort in reverese order as part of a >> tuple in the row. We tell them its possible if they encode their data >> in a certain way. And we also tell them "oh, BTW if you have binary >> data in your tuple it can be tricky to get it right". So one goal of >> Typo is to make this easier for users. I think something like the >> following would do this and get the lexicographic sort order correct. >> >> class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> { >> public RDTypo() { >> super(new PairLexicoder<Long,Date>(new LongLexicoder(), new >> ReverseLexicoder<Date>(new DateLexicoder())), >> new StringLexicoder(), new StringLexicoder(), new >> TextLexicoder()); >> } >> } >> >> I so wish that Java had typedef, it could make the Typo API much more >> concise. I never thought I would actually miss C++ template >> programming :) I still need to do some more research on Java generics >> to see if I can make things more concise. >> >> > >> > So its avoiding being a full-fledged ORM solution like Gora. >> > >> > >> >> Am I right in assuming that this is about simplifying the API for >> >> storing typed data in the key, and not about providing a mechanism for >> >> query. Isn't this really just about storing stuff you've already >> >> decided was a good structure for whatever your query mechanism is? >>
+
Keith Turner 2012-08-15, 16:50
Ed Kohlwey 2012-08-16, 13:55
I started looking this morning at what it would take to change the encoders to be nio based, and then realized that this was actually an issue in the core accumulo API. I think it would be nice to start a dialogue around introducing some generic superclasses to Encoder, Key, and Value in order to provide a cleaner, NIO based API that things like Typo can be implemented on top of. I've started an issue to track thoughts on this: https://issues.apache.org/jira/browse/ACCUMULO-731This would also be a major help to projects like Gora. On Wed, Aug 15, 2012 at 12:50 PM, Keith Turner <[EMAIL PROTECTED]> wrote: > On Wed, Aug 15, 2012 at 10:09 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > > One suggestion I'd make is to force users to name their tuples by making > > the tuple types abstract. This won't help your complexity but IMHO makes > > code more readable. > > Thats a good suggestion, I made Typo abstract. I also made class like > TypoScanner, TypoMutation, etc inner classes of Typo. Doing this I > was able to achieve what I wanted to with typedef, making code that > uses Typo more concise. The inner classes and type parameters > actually work the way I want, I was not sure it would before I tried > it. > > > > > This an issue of java style, but there's nothing more irritating than > > tuples floating around code without having an obvious explanation of "why > > do these things belong together"? > > > > On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <[EMAIL PROTECTED]> wrote: > > > >> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <[EMAIL PROTECTED]> wrote: > >> > I think its not just about types, but specifically primitive types and > >> > tuples. > >> Right. And sorting is another very important aspect. User want to > >> do things like store dates that sort in reverese order as part of a > >> tuple in the row. We tell them its possible if they encode their data > >> in a certain way. And we also tell them "oh, BTW if you have binary > >> data in your tuple it can be tricky to get it right". So one goal of > >> Typo is to make this easier for users. I think something like the > >> following would do this and get the lexicographic sort order correct. > >> > >> class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> { > >> public RDTypo() { > >> super(new PairLexicoder<Long,Date>(new LongLexicoder(), new > >> ReverseLexicoder<Date>(new DateLexicoder())), > >> new StringLexicoder(), new StringLexicoder(), new > >> TextLexicoder()); > >> } > >> } > >> > >> I so wish that Java had typedef, it could make the Typo API much more > >> concise. I never thought I would actually miss C++ template > >> programming :) I still need to do some more research on Java generics > >> to see if I can make things more concise. > >> > >> > > >> > So its avoiding being a full-fledged ORM solution like Gora. > >> > > >> > > >> >> Am I right in assuming that this is about simplifying the API for > >> >> storing typed data in the key, and not about providing a mechanism > for > >> >> query. Isn't this really just about storing stuff you've already > >> >> decided was a good structure for whatever your query mechanism is? > >> >
+
Ed Kohlwey 2012-08-16, 13:55
Keith Turner 2012-08-14, 17:29
On Mon, Aug 13, 2012 at 6:03 PM, Josh Elser <[EMAIL PROTECTED]> wrote: > Even with something as simple as a pair, things can start getting difficult. > I suppose it really revolves around the level of support you want to provide > at scan time, e.g. "find all pairs where the second is 'x'?". I implemented support for Pair and Triple. Getting the tuples to sort correctly lexicographically is tricky, which is why a library like Typo is nice. Below is a link to an example that uses Pair to store an edge in the row of the Accumulo key. The example scans over all Pairs where the first is X. This can be done efficiently by leveraging the way Pair sorts. Finding all pairs where the second is X would require a full table scan. One way to avoid this is to insert the edge twice, insert Pair(X,Y) and Pair(Y,X), then you can find what you are looking for w/o a full table scan. I think this what you mentioned below. https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/client/typo/example/GraphExample.java> > Spending a few minutes thinking about it, an index could be a separate table > but wouldn't necessarily have to be. It depends on the complexity of the > structure you're trying to index. Using the Pair example again, you could > reserve a column (family) to place index records in which simply inverts the > Pair in the colqual. Right, so you could use Typo to do this but it would not do it for you. > > > On 08/13/2012 11:06 AM, Keith Turner wrote: >> >> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<[EMAIL PROTECTED]> wrote: >>> >>> Neat idea, Keith. >>> >>> Have you thought about how to support more complex types? Specifically, >>> arrays, hashes and the nesting of those? Any thoughts about indexing for >>> those complex types? >> >> Yeah I was thinking that would be nice. I see a lot of users putting >> multiple types into the row and/or columns. Could have something like >> TupleEncoder<List<A>>. TupleEncoder would need to encode it elements >> such that it sorts correctly. However, this may be cumbersome to use >> if you want to use different types. For example I want a row composed >> of a Long and String. I was thinking of having the following types to >> handle this case. >> >> class Pair<A,B> extends LexEncoder{ >> Pair(LexEncoder<A> enc1, LexEncoder<B> enc2); >> A getFirst(){} >> B getSecond(){} >> } >> >> class Triple<A,B,C>{//follows same pattern as Pair} >> class Quadruple<A,B,C,D>{//follows same pattern as Pair} >> >> This would allow a user to write code like the following that makes it >> easy to work with a row composed of a Long and String. >> >> Pair<Long, String> pair; >> long l = pair.getFirst(); >> String s = pair.getSecond(); >> >> I am still thinking the tuple concept through. >> >> I was not considering indexing. I assuming you mean creating an index >> in another table? >> >>> Initial thoughts are that it would make the most sense to place Typo at >>> the >>> contrib level (or something equivalent). The reason being: Typo doesn't >>> change the underlying functionality of Accumulo; it only provides a layer >>> on >>> top of it that makes life easier for developers. >> >> I think putting it in contrib makes sense. >> >>> >>> On 08/10/2012 07:07 PM, Keith Turner wrote: >>>> >>>> I put together a simple abstraction layer for Accumulo that makes it >>>> easier to read and write Java objects to Accumulo key and value >>>> fields. The data written to Accumulo sort correctly >>>> lexicographically. >>>> >>>> I put the code on github and would like some feedback on the design >>>> and whether it should be included with Accumulo. >>>> >>>> https://github.com/keith-turner/typo>>>> >>>> Its still a little rough and I need to add encoder for all of the >>>> primitive types. >>>> >>>> Keith
+
Keith Turner 2012-08-14, 17:29
Billie Rinaldi 2012-08-13, 16:34
On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <[EMAIL PROTECTED]> wrote: > I put together a simple abstraction layer for Accumulo that makes it > easier to read and write Java objects to Accumulo key and value > fields. The data written to Accumulo sort correctly > lexicographically. > > I put the code on github and would like some feedback on the design > and whether it should be included with Accumulo. > > https://github.com/keith-turner/typo> > Its still a little rough and I need to add encoder for all of the > primitive types. > > Keith > Looks interesting. It would be nice to have the TypedValueCombiner use the same Encoder, which leads to the question of where we should put this. If Typo is moved to contrib, perhaps the TypedValueCombiner should be there, too. Another option might be a submodule of examples. It would be nice to have a set of standard encodings shipped with Accumulo. I'd like to discuss the LexEncoder. It is an Encoder that preserves sort order, but it doesn't have a way to enforce or test the sorting, or even to encourage the preservation of sort order except through the javadoc. Is there anything we can do about this? We could at least make some reusable testing patterns, but it would be nice if we could do more. Billie
+
Billie Rinaldi 2012-08-13, 16:34
Keith Turner 2012-08-13, 16:55
On Mon, Aug 13, 2012 at 12:34 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote: > On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <[EMAIL PROTECTED]> wrote: > >> I put together a simple abstraction layer for Accumulo that makes it >> easier to read and write Java objects to Accumulo key and value >> fields. The data written to Accumulo sort correctly >> lexicographically. >> >> I put the code on github and would like some feedback on the design >> and whether it should be included with Accumulo. >> >> https://github.com/keith-turner/typo>> >> Its still a little rough and I need to add encoder for all of the >> primitive types. >> >> Keith >> > > Looks interesting. It would be nice to have the TypedValueCombiner use the > same Encoder, which leads to the question of where we should put this. If I agree, it would be nice for it share code w/ those iterators. Also, it would be nice to have a DisplayFormatter and Constraint support. > Typo is moved to contrib, perhaps the TypedValueCombiner should be there, > too. Another option might be a submodule of examples. It would be nice to > have a set of standard encodings shipped with Accumulo. > > I'd like to discuss the LexEncoder. It is an Encoder that preserves sort I am thinking of changing the name to Lexicoder. > order, but it doesn't have a way to enforce or test the sorting, or even to > encourage the preservation of sort order except through the javadoc. Is > there anything we can do about this? We could at least make some reusable > testing patterns, but it would be nice if we could do more. If you have Lexicoder<A> and A is comparable, the we could provide infrastructure to make it easy to write test that confirm they agree. This could be completely automated if you could generate a good set of representative data for type A. For example, for Long we would at least want to test that MIN, MIN+1, -1, 0, 1, MAX-1, and MAX sort correctly lexicograpically and via the comparable interface. I am not sure how we would automatically generate this set of test data for an arbitrary type though. We could make running the test simple if someone provides the data. > > Billie
+
Keith Turner 2012-08-13, 16:55
|
|