Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> B[yi]teSize execwork tasks someone could potentially help out with...


Copy link to this message
-
Re: B[yi]teSize execwork tasks someone could potentially help out with...
I've done so more work on the BitComImpl so you should probably constrain
your work to the Rpc base classes, User client and server and BitCom client
and server.

J

On Thu, Apr 25, 2013 at 9:10 PM, David Alves <[EMAIL PROTECTED]> wrote:

> Hi Jacques
>
>         I can take the RPC stuff.
>         Have you made any progress in Bit<>Bit comms?
>
> Best
> David
>
> On Apr 25, 2013, at 11:06 PM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:
>
> > I'm working on the execwork stuff and if someone would like to help out,
> > here are a couple of things that need doing.  I figured I'd drop them
> here
> > and see if anyone wants to work on them in the next couple of days.  If
> so,
> > let me know otherwise I'll be picking them up soon.
> >
> > *RPC*
> > - RPC Layer Handshakes: Currently, I haven't implemented the handshake
> that
> > should happen in either the User <> Bit or the Bit <> Bit layer.  The
> plan
> > was to use an additional inserted event handler that removed itself from
> > the event pipeline after a successful handshake or disconnected the
> channel
> > on a failed handshake (with appropriate logging).  The main validation at
> > this point will be simply confirming that both endpoints are running on
> the
> > same protocol version.   The only other information that is currently
> > needed is that that in the Bit <> Bit communication, the client should
> > inform the server of its DrillEndpoint so that the server can then map
> that
> > for future communication in the other direction.
> >
> > *DataTypes*
> > - General Expansion: Currently, we have a hodgepodge of datatypes within
> > the org.apache.drill.common.expression.types.DataType.  We need to clean
> > this up.  There should be types that map to standard sql types.  My
> > thinking is that we should actually have separate types for each for
> > nullable, non-nullable and repeated (required, optional and repeated in
> > protobuf vernaciular) since we'll generally operate with those values
> > completely differently (and that each type should reveal which it is).
>  We
> > should also have a relationship mapping from each to the other (e.g. how
> to
> > convert a signed 32 bit int into a nullable signed 32 bit int.
> >
> > - Map Types: We don't need nullable but we will need different map types:
> > inline and fieldwise.  I think these will useful for the execution engine
> > and will be leverage depending on the particular needs-- for example
> > fieldwise will be a natural fit where we're operating on columnar data
> and
> > doing an explode or other fieldwise nested operation and inline will be
> > useful when we're doing things like sorting a complex field.  Inline will
> > also be appropriate where we have extremely sparse record sets.  We'll
> just
> > need transformation methods between the two variations.  In the case of a
> > fieldwise map type field, the field is virtual and only exists to contain
> > its child fields.
> >
> > - Non-static DataTypes: We have a need types that don't fit the static
> data
> > type model above.  Examples include fixed width types (e.g. 10 byte
> > string), polymorphic (inline encoded) types (number or string depending
> on
> > record) and repeated nested versions of our other types.  These are a
> > little more gnarly as we need to support canonicalization of these.
>  Optiq
> > has some methods for how to handle this kind of type system so it
> probably
> > makes sense to leverage that system.
> >
> > *Expression Type Materialization*
> > - LogicalExpression type materialization: Right now, LogicalExpressions
> > include support for late type binding.  As part of the record batch
> > execution path, these need to get materialized with correct casting, etc
> > based on the actual found schema.  As such, we need to have a function
> > which takes a LogicalExpression tree, applies a materialized BatchSchema
> > and returns a new LogicalExpression tree with full type settings.  As
> part
> > of this process, all types need to be cast as necessary and full