Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Is anyone using serialized iterators to provide provenance data?


+
David Medinets 2013-05-15, 18:51
+
Christopher 2013-05-15, 20:25
+
David Medinets 2013-05-16, 00:44
+
Josh Elser 2013-05-16, 00:58
Copy link to this message
-
Re: Is anyone using serialized iterators to provide provenance data?
I think you've got the gist, Josh. I was thinking in terms of the gitk
utilty to see data-set history. And git branch to see the list of
data-sets. Git was just a metaphor.
On Wed, May 15, 2013 at 8:58 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> Oh, I see what you mean. Table B was created from table A with a function
> F (where F is some collection of iterators like you said).
>
> It could be a neat application of the clone command. Storing that
> information on table B is some exercise in where to put that immutable
> information (that's me ignoring that problem :P).
>
> You say git: do you actually intend to have a cheap replay ability? Or
> merely be able to view the history and be able to work through the
> transformations again?
>
> Seems reasonable for a 1.6 wish to me.
>
> On 05/15/2013 08:44 PM, David Medinets wrote:
>
>> I don't see those as covering the same ground. Let's say I have an
>> Accumulo table for a given human's genome. As a scientist, I want to apply
>> a set of filters to create a subset of the genome. This provides a
>> transform from data-set A to data-set B. Since iterators were used for the
>> transform, we could serialize the set of iterators used by the
>> transformation. Both data-sets are immutable. Think git for data-sets.
>>
>>
>> On Wed, May 15, 2013 at 4:25 PM, Christopher <[EMAIL PROTECTED]<mailto:
>> [EMAIL PROTECTED]>> wrote:
>>
>>     I think this might relate to ACCUMULO-1397, in the form of providing a
>>     mechanism to specify iterator profiles, or ACCUMULO-415.
>>
>>     --
>>     Christopher L Tubbs II
>>     http://gravatar.com/ctubbsii
>>
>>
>>     On Wed, May 15, 2013 at 2:51 PM, David Medinets
>>     <[EMAIL PROTECTED] <mailto:david.medinets@gmail.**com<[EMAIL PROTECTED]>>>
>> wrote:
>>     > If you apply a set of iterators to one table to produce another,
>>     it seems
>>     > possible to serialize the iterator stack alongside the new table
>>     in some
>>     > catalog to provide provenance. The assumption is that the tables are
>>     > immutable, I think. Is anyone doing this or has anyone thought
>>     about doing
>>     > so? Just curious and wanted to ask before I forgot about the idea.
>>
>>
>>
>
+
Christopher 2013-05-16, 01:15
+
Keith Turner 2013-05-16, 20:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB