Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Is anyone using serialized iterators to provide provenance data?


Copy link to this message
-
Re: Is anyone using serialized iterators to provide provenance data?
Keith Turner 2013-05-16, 20:56
On Wed, May 15, 2013 at 9:15 PM, Christopher <[EMAIL PROTECTED]> wrote:

> Seems to me this is nothing more than "clone and also add these
> per-table iterators on all scopes". Might be a neat little utility to
>

Clone has always had this.  When cloning a table, a set of props to set and
exclude (not copy from source) can be specified.  These config changes are
made before any tablet in the clone is ever brought online.
> wrap those features into a single step from the user's perspective.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, May 15, 2013 at 8:58 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> > Oh, I see what you mean. Table B was created from table A with a
> function F
> > (where F is some collection of iterators like you said).
> >
> > It could be a neat application of the clone command. Storing that
> > information on table B is some exercise in where to put that immutable
> > information (that's me ignoring that problem :P).
> >
> > You say git: do you actually intend to have a cheap replay ability? Or
> > merely be able to view the history and be able to work through the
> > transformations again?
> >
> > Seems reasonable for a 1.6 wish to me.
> >
> >
> > On 05/15/2013 08:44 PM, David Medinets wrote:
> >>
> >> I don't see those as covering the same ground. Let's say I have an
> >> Accumulo table for a given human's genome. As a scientist, I want to
> apply a
> >> set of filters to create a subset of the genome. This provides a
> transform
> >> from data-set A to data-set B. Since iterators were used for the
> transform,
> >> we could serialize the set of iterators used by the transformation. Both
> >> data-sets are immutable. Think git for data-sets.
> >>
> >>
> >> On Wed, May 15, 2013 at 4:25 PM, Christopher <[EMAIL PROTECTED]
> >> <mailto:[EMAIL PROTECTED]>> wrote:
> >>
> >>     I think this might relate to ACCUMULO-1397, in the form of
> providing a
> >>     mechanism to specify iterator profiles, or ACCUMULO-415.
> >>
> >>     --
> >>     Christopher L Tubbs II
> >>     http://gravatar.com/ctubbsii
> >>
> >>
> >>     On Wed, May 15, 2013 at 2:51 PM, David Medinets
> >>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
> >>     > If you apply a set of iterators to one table to produce another,
> >>     it seems
> >>     > possible to serialize the iterator stack alongside the new table
> >>     in some
> >>     > catalog to provide provenance. The assumption is that the tables
> are
> >>     > immutable, I think. Is anyone doing this or has anyone thought
> >>     about doing
> >>     > so? Just curious and wanted to ask before I forgot about the idea.
> >>
> >>
> >
>