Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> B[yi]teSize execwork tasks someone could potentially help out with...


Copy link to this message
-
Re: B[yi]teSize execwork tasks someone could potentially help out with...
They are on the list but the list is long :)

Have a good weekend.

On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:

> So if no one picks anything up you will be done with all the work in the
> next couple of days? :)
>
> Would like to help out but I'm traveling to la over the weekend.
>
> I'll sync with you Monday to see how I can help then.
>
> Tim
>
> Sent from my iPhone
>
> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:
>
> > I'm working on the execwork stuff and if someone would like to help out,
> > here are a couple of things that need doing.  I figured I'd drop them
> here
> > and see if anyone wants to work on them in the next couple of days.  If
> so,
> > let me know otherwise I'll be picking them up soon.
> >
> > *RPC*
> > - RPC Layer Handshakes: Currently, I haven't implemented the handshake
> that
> > should happen in either the User <> Bit or the Bit <> Bit layer.  The
> plan
> > was to use an additional inserted event handler that removed itself from
> > the event pipeline after a successful handshake or disconnected the
> channel
> > on a failed handshake (with appropriate logging).  The main validation at
> > this point will be simply confirming that both endpoints are running on
> the
> > same protocol version.   The only other information that is currently
> > needed is that that in the Bit <> Bit communication, the client should
> > inform the server of its DrillEndpoint so that the server can then map
> that
> > for future communication in the other direction.
> >
> > *DataTypes*
> > - General Expansion: Currently, we have a hodgepodge of datatypes within
> > the org.apache.drill.common.expression.types.DataType.  We need to clean
> > this up.  There should be types that map to standard sql types.  My
> > thinking is that we should actually have separate types for each for
> > nullable, non-nullable and repeated (required, optional and repeated in
> > protobuf vernaciular) since we'll generally operate with those values
> > completely differently (and that each type should reveal which it is).
>  We
> > should also have a relationship mapping from each to the other (e.g. how
> to
> > convert a signed 32 bit int into a nullable signed 32 bit int.
> >
> > - Map Types: We don't need nullable but we will need different map types:
> > inline and fieldwise.  I think these will useful for the execution engine
> > and will be leverage depending on the particular needs-- for example
> > fieldwise will be a natural fit where we're operating on columnar data
> and
> > doing an explode or other fieldwise nested operation and inline will be
> > useful when we're doing things like sorting a complex field.  Inline will
> > also be appropriate where we have extremely sparse record sets.  We'll
> just
> > need transformation methods between the two variations.  In the case of a
> > fieldwise map type field, the field is virtual and only exists to contain
> > its child fields.
> >
> > - Non-static DataTypes: We have a need types that don't fit the static
> data
> > type model above.  Examples include fixed width types (e.g. 10 byte
> > string), polymorphic (inline encoded) types (number or string depending
> on
> > record) and repeated nested versions of our other types.  These are a
> > little more gnarly as we need to support canonicalization of these.
>  Optiq
> > has some methods for how to handle this kind of type system so it
> probably
> > makes sense to leverage that system.
> >
> > *Expression Type Materialization*
> > - LogicalExpression type materialization: Right now, LogicalExpressions
> > include support for late type binding.  As part of the record batch
> > execution path, these need to get materialized with correct casting, etc
> > based on the actual found schema.  As such, we need to have a function
> > which takes a LogicalExpression tree, applies a materialized BatchSchema
> > and returns a new LogicalExpression tree with full type settings.  As
> part
> > of this process, all types need to be cast as necessary and full