Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> feedback on Typo


Copy link to this message
-
Re: feedback on Typo
Even with something as simple as a pair, things can start getting
difficult. I suppose it really revolves around the level of support you
want to provide at scan time, e.g. "find all pairs where the second is
'x'?".

Spending a few minutes thinking about it, an index could be a separate
table but wouldn't necessarily have to be. It depends on the complexity
of the structure you're trying to index. Using the Pair example again,
you could reserve a column (family) to place index records in which
simply inverts the Pair in the colqual.

On 08/13/2012 11:06 AM, Keith Turner wrote:
> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<[EMAIL PROTECTED]>  wrote:
>> Neat idea, Keith.
>>
>> Have you thought about how to support more complex types? Specifically,
>> arrays, hashes and the nesting of those? Any thoughts about indexing for
>> those complex types?
> Yeah I was thinking that would be nice.  I see a lot of users putting
> multiple types into the row and/or columns.  Could have something like
> TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
> such that it sorts correctly.  However, this may be cumbersome to use
> if you want to use different types.  For example I want a row composed
> of a Long and String.  I was thinking of having the following types to
> handle this case.
>
> class Pair<A,B>  extends LexEncoder{
>     Pair(LexEncoder<A>  enc1, LexEncoder<B>  enc2);
>     A getFirst(){}
>     B getSecond(){}
> }
>
> class Triple<A,B,C>{//follows same pattern as Pair}
> class Quadruple<A,B,C,D>{//follows same pattern as Pair}
>
> This would allow a user to write code like the following that makes it
> easy to work with a row composed of a Long and String.
>
> Pair<Long, String>  pair;
> long l = pair.getFirst();
> String s = pair.getSecond();
>
> I am still thinking the tuple concept through.
>
> I was not considering indexing.  I assuming you mean creating an index
> in another table?
>
>> Initial thoughts are that it would make the most sense to place Typo at the
>> contrib level (or something equivalent). The reason being: Typo doesn't
>> change the underlying functionality of Accumulo; it only provides a layer on
>> top of it that makes life easier for developers.
> I think putting it in contrib makes sense.
>
>>
>> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>> I put together a simple abstraction layer for Accumulo that makes it
>>> easier to read and write Java objects to Accumulo key and value
>>> fields.  The data written to Accumulo sort correctly
>>> lexicographically.
>>>
>>> I put the code on github and would like some feedback on the design
>>> and whether it should be included with Accumulo.
>>>
>>> https://github.com/keith-turner/typo
>>>
>>> Its still a little rough and I need to add encoder for all of the
>>> primitive types.
>>>
>>> Keith