Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Is anyone using serialized iterators to provide provenance data?


Copy link to this message
-
Re: Is anyone using serialized iterators to provide provenance data?
On Wed, May 15, 2013 at 9:15 PM, Christopher <[EMAIL PROTECTED]> wrote:

> Seems to me this is nothing more than "clone and also add these
> per-table iterators on all scopes". Might be a neat little utility to
>

Clone has always had this.  When cloning a table, a set of props to set and
exclude (not copy from source) can be specified.  These config changes are
made before any tablet in the clone is ever brought online.
> wrap those features into a single step from the user's perspective.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, May 15, 2013 at 8:58 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> > Oh, I see what you mean. Table B was created from table A with a
> function F
> > (where F is some collection of iterators like you said).
> >
> > It could be a neat application of the clone command. Storing that
> > information on table B is some exercise in where to put that immutable
> > information (that's me ignoring that problem :P).
> >
> > You say git: do you actually intend to have a cheap replay ability? Or
> > merely be able to view the history and be able to work through the
> > transformations again?
> >
> > Seems reasonable for a 1.6 wish to me.
> >
> >
> > On 05/15/2013 08:44 PM, David Medinets wrote:
> >>
> >> I don't see those as covering the same ground. Let's say I have an
> >> Accumulo table for a given human's genome. As a scientist, I want to
> apply a
> >> set of filters to create a subset of the genome. This provides a
> transform
> >> from data-set A to data-set B. Since iterators were used for the
> transform,
> >> we could serialize the set of iterators used by the transformation. Both
> >> data-sets are immutable. Think git for data-sets.
> >>
> >>
> >> On Wed, May 15, 2013 at 4:25 PM, Christopher <[EMAIL PROTECTED]
> >> <mailto:[EMAIL PROTECTED]>> wrote:
> >>
> >>     I think this might relate to ACCUMULO-1397, in the form of
> providing a
> >>     mechanism to specify iterator profiles, or ACCUMULO-415.
> >>
> >>     --
> >>     Christopher L Tubbs II
> >>     http://gravatar.com/ctubbsii
> >>
> >>
> >>     On Wed, May 15, 2013 at 2:51 PM, David Medinets
> >>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
> >>     > If you apply a set of iterators to one table to produce another,
> >>     it seems
> >>     > possible to serialize the iterator stack alongside the new table
> >>     in some
> >>     > catalog to provide provenance. The assumption is that the tables
> are
> >>     > immutable, I think. Is anyone doing this or has anyone thought
> >>     about doing
> >>     > so? Just curious and wanted to ask before I forgot about the idea.
> >>
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB