Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> feedback on Typo


+
Keith Turner 2012-08-11, 00:07
+
Ed Kohlwey 2012-08-13, 00:11
+
Keith Turner 2012-08-13, 15:02
+
Josh Elser 2012-08-13, 01:36
+
Keith Turner 2012-08-13, 16:06
Copy link to this message
-
Re: feedback on Typo
Even with something as simple as a pair, things can start getting
difficult. I suppose it really revolves around the level of support you
want to provide at scan time, e.g. "find all pairs where the second is
'x'?".

Spending a few minutes thinking about it, an index could be a separate
table but wouldn't necessarily have to be. It depends on the complexity
of the structure you're trying to index. Using the Pair example again,
you could reserve a column (family) to place index records in which
simply inverts the Pair in the colqual.

On 08/13/2012 11:06 AM, Keith Turner wrote:
> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<[EMAIL PROTECTED]>  wrote:
>> Neat idea, Keith.
>>
>> Have you thought about how to support more complex types? Specifically,
>> arrays, hashes and the nesting of those? Any thoughts about indexing for
>> those complex types?
> Yeah I was thinking that would be nice.  I see a lot of users putting
> multiple types into the row and/or columns.  Could have something like
> TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
> such that it sorts correctly.  However, this may be cumbersome to use
> if you want to use different types.  For example I want a row composed
> of a Long and String.  I was thinking of having the following types to
> handle this case.
>
> class Pair<A,B>  extends LexEncoder{
>     Pair(LexEncoder<A>  enc1, LexEncoder<B>  enc2);
>     A getFirst(){}
>     B getSecond(){}
> }
>
> class Triple<A,B,C>{//follows same pattern as Pair}
> class Quadruple<A,B,C,D>{//follows same pattern as Pair}
>
> This would allow a user to write code like the following that makes it
> easy to work with a row composed of a Long and String.
>
> Pair<Long, String>  pair;
> long l = pair.getFirst();
> String s = pair.getSecond();
>
> I am still thinking the tuple concept through.
>
> I was not considering indexing.  I assuming you mean creating an index
> in another table?
>
>> Initial thoughts are that it would make the most sense to place Typo at the
>> contrib level (or something equivalent). The reason being: Typo doesn't
>> change the underlying functionality of Accumulo; it only provides a layer on
>> top of it that makes life easier for developers.
> I think putting it in contrib makes sense.
>
>>
>> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>> I put together a simple abstraction layer for Accumulo that makes it
>>> easier to read and write Java objects to Accumulo key and value
>>> fields.  The data written to Accumulo sort correctly
>>> lexicographically.
>>>
>>> I put the code on github and would like some feedback on the design
>>> and whether it should be included with Accumulo.
>>>
>>> https://github.com/keith-turner/typo
>>>
>>> Its still a little rough and I need to add encoder for all of the
>>> primitive types.
>>>
>>> Keith
+
Christopher Tubbs 2012-08-13, 21:12
+
Ed Kohlwey 2012-08-15, 13:19
+
Keith Turner 2012-08-15, 13:38
+
Marc Parisi 2012-08-15, 13:45
+
Ed Kohlwey 2012-08-15, 14:09
+
Keith Turner 2012-08-15, 16:50
+
Ed Kohlwey 2012-08-16, 13:55
+
Keith Turner 2012-08-14, 17:29
+
Billie Rinaldi 2012-08-13, 16:34
+
Keith Turner 2012-08-13, 16:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB