Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Is anyone using serialized iterators to provide provenance data?


Copy link to this message
-
Re: Is anyone using serialized iterators to provide provenance data?
Seems to me this is nothing more than "clone and also add these
per-table iterators on all scopes". Might be a neat little utility to
wrap those features into a single step from the user's perspective.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Wed, May 15, 2013 at 8:58 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> Oh, I see what you mean. Table B was created from table A with a function F
> (where F is some collection of iterators like you said).
>
> It could be a neat application of the clone command. Storing that
> information on table B is some exercise in where to put that immutable
> information (that's me ignoring that problem :P).
>
> You say git: do you actually intend to have a cheap replay ability? Or
> merely be able to view the history and be able to work through the
> transformations again?
>
> Seems reasonable for a 1.6 wish to me.
>
>
> On 05/15/2013 08:44 PM, David Medinets wrote:
>>
>> I don't see those as covering the same ground. Let's say I have an
>> Accumulo table for a given human's genome. As a scientist, I want to apply a
>> set of filters to create a subset of the genome. This provides a transform
>> from data-set A to data-set B. Since iterators were used for the transform,
>> we could serialize the set of iterators used by the transformation. Both
>> data-sets are immutable. Think git for data-sets.
>>
>>
>> On Wed, May 15, 2013 at 4:25 PM, Christopher <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     I think this might relate to ACCUMULO-1397, in the form of providing a
>>     mechanism to specify iterator profiles, or ACCUMULO-415.
>>
>>     --
>>     Christopher L Tubbs II
>>     http://gravatar.com/ctubbsii
>>
>>
>>     On Wed, May 15, 2013 at 2:51 PM, David Medinets
>>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>     > If you apply a set of iterators to one table to produce another,
>>     it seems
>>     > possible to serialize the iterator stack alongside the new table
>>     in some
>>     > catalog to provide provenance. The assumption is that the tables are
>>     > immutable, I think. Is anyone doing this or has anyone thought
>>     about doing
>>     > so? Just curious and wanted to ask before I forgot about the idea.
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB