Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill >> mail # dev >> First pass at a reference interpreter


+
Jacques Nadeau 2013-01-14, 23:56
Copy link to this message
-
Re: First pass at a reference interpreter

Cool stuff, Jacques - will give it a shot ASAP!

Cheers,
Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 14 Jan 2013, at 15:56, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> I've been pulling together a reference logical plan interpreter.  I'm
> working with Ted to get it inside the Drill sandbox.  For now, you can find
> it on my repo at https://github.com/jacques-n/incubator-drill (prototype
> branch)
>
>
>
> The goals of the reference interpreter are:
>
>
>   - To provide a simple way to run a Logical Plan against some sample data
>   and get back the expected result
>   - Allow work to start on the parsers while we scale up the performance
>   and capabilities of the execution engine and optimizer.
>   - Allow evaluation work on particular technical approaches such as
>   exploring the impact of hierarchical and schema less data on query
>   evaluation.
>
> These goals do not include performance, memory handling, or
> efficiency.  Currently,
> the interpreter is a single node/thread process.  This will change shortly
> so that it also run as a clustered process.
>
> The entry point is inside the /sandbox/prototype/exec/ref module:
> org.apache.drill.exec.ref.ReferenceInterpreter.main();  The example program
> utilizes two resources: simple-plan.json and donuts.json and outputs data
> to /opt/data/out.json.
>
>
> Some of things that 'work'.
>
>
>   - Read/write basic json.
>   - ROPs (reference operators): Filter, Transform, Group, Aggregate
>   (simple), Order, Union.
>   - Example aggregate and basic functions including sum, count, multiply,
>   add, compare, equals.
>
> Basic glossary/concepts (we'll get this on the wiki/javadocs):
>
>
>   - LOP: Logical Operator.  An implementation agnostic data flow operator
>   utilized by the Logical Plan.
>   - ROP: Reference Operator: A reference operator implementation that
>   pairs with a LOP.
>   - FunctionDefinition: A definition of a particular function.  Describes
>   a set of aliases, an allowable set of input arguments and an interface that
>   will attempt to determine output type.
>   - BasicEvaluator: An implementation of a particular non-aggregate
>   expression.  Receives a record pointer at creation time. Returns a
>   DataValue.
>   - AggregateEvaluator: An implementation of a particular aggregating
>   function.  Is provided a record pointer at creation time.  Expects regular
>   calls to addRecord() followed by a call to eval() which provides the
>   aggregate value.
>   - DataValue: A pointer to a particular data value.  Implementation
>   classes includes things like ScalarLong, ScalarBytes, SimpleMapValue and
>   SimpleArrayValue.
>
> The standard record iterator utilized between each ROP utilizes the
> org.apache.drill.exec.ref.RecordIterator interface.  This is somewhat
> inspired by the AttributeSource concepts from within the Lucene project.
> (I'm planning to extend these concepts all the way to the individual
> DataValues.)
>
>
>
> My next goals are to add tests, finish adding ROPs, add local and remote
> exchange nodes (parallelization), add a bunch of documentation and extract
> out the Execution plan as a separate intermediate representation.
>
>
>
> It needs a lot more evaluators to be a true reference interpreter (as well
> as the rest of the ROPs).  The existing ones can be utilized as prototypes.
> Anyone interested in ripping through a bunch of additional evaluators and
> associated FunctionDefinitions?
+
Jacques Nadeau 2013-01-15, 00:06
+
Michael Hausenblas 2013-01-15, 00:31
+
Jacques Nadeau 2013-01-15, 00:37
+
Michael Hausenblas 2013-01-15, 01:07
+
Jacques Nadeau 2013-01-15, 17:26
+
Ted Dunning 2013-01-16, 03:22
+
Jacques Nadeau 2013-01-16, 03:29
+
Andrew Psaltis 2013-01-15, 00:07
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB