Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Is anyone using serialized iterators to provide provenance data?


Copy link to this message
-
Re: Is anyone using serialized iterators to provide provenance data?
Oh, I see what you mean. Table B was created from table A with a
function F (where F is some collection of iterators like you said).

It could be a neat application of the clone command. Storing that
information on table B is some exercise in where to put that immutable
information (that's me ignoring that problem :P).

You say git: do you actually intend to have a cheap replay ability? Or
merely be able to view the history and be able to work through the
transformations again?

Seems reasonable for a 1.6 wish to me.

On 05/15/2013 08:44 PM, David Medinets wrote:
> I don't see those as covering the same ground. Let's say I have an
> Accumulo table for a given human's genome. As a scientist, I want to
> apply a set of filters to create a subset of the genome. This provides
> a transform from data-set A to data-set B. Since iterators were used
> for the transform, we could serialize the set of iterators used by the
> transformation. Both data-sets are immutable. Think git for data-sets.
>
>
> On Wed, May 15, 2013 at 4:25 PM, Christopher <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     I think this might relate to ACCUMULO-1397, in the form of providing a
>     mechanism to specify iterator profiles, or ACCUMULO-415.
>
>     --
>     Christopher L Tubbs II
>     http://gravatar.com/ctubbsii
>
>
>     On Wed, May 15, 2013 at 2:51 PM, David Medinets
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>     > If you apply a set of iterators to one table to produce another,
>     it seems
>     > possible to serialize the iterator stack alongside the new table
>     in some
>     > catalog to provide provenance. The assumption is that the tables are
>     > immutable, I think. Is anyone doing this or has anyone thought
>     about doing
>     > so? Just curious and wanted to ask before I forgot about the idea.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB