Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> timeline for dist execution

Copy link to this message
Re: timeline for dist execution
I've not worked with Guice before.  However, I've found some DI systems
finicky, hard to debug and slow in the past.  Can you pick a place where
you think it make life easier/better and do a little example to share with
the list?

Let's discuss the other items on our hangout on Tuesday.  I've tried to
invite the list so anybody should be able to join if they'd like.

I've also pushed my WIP stuff to the Apache repo so that it is more
visible.  It is in a development branched called execwork:


On Sat, Apr 13, 2013 at 4:28 PM, David Alves <[EMAIL PROTECTED]> wrote:

> Hi Jacques
>         Thank you for posting your wip and for what seems like a huge
> effort.
>         Some thoughts:
>         I think we might finally be at a place where we could
> divide-and-conquer and break that into a series of reviewable and testable
> patches .
>         I can see the following, at least:
>         - rpc/serialization stuff
>         - server runtime and boot/shutdown scripts
>         - in memory data structures
>         - cluster mgmt
>         - distributed and local physical ops
>         - schema related stuff
>         - query distribution and coordination
>         I'm happy to start the break down/test effort and to update/create
> jira's as necessary.
>         Also this might be a good time to start a discussion on
> modularization.
>         It seems that currently drill is using a mix of programmatic
> implementation loading and ad-hoc classpath scanning (which might become an
> issue in security managed jvm's if drill is used as library instead of as a
> runtime).
>         Because drill will have a lot of pluggable components at all sorts
> of levels (SE's, query executors, data formats etc), IMO we could consider
> moving to an externally maintained module system.
>         I've used guice in huge modularized projects with relative
> success. I like it due to its small dependency footprint, no-xml'ness, use
> of javax.inject classes (meaning usually no need to reference guice
> directly in the classes) and relatively small learning curve.
>         I really like the config format that you've chosen and we could
> use that to load configure modules.
>         What do you think?
> Best
> David
> On Apr 13, 2013, at 5:18 AM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:
> > You can check out some of what I've been working on my GitHub at
> > https://github.com/jacques-n/incubator-drill/tree/execwork
> >
> > Key concepts are:
> > 1) The primary in-memory data structure is a RecordBatch that contains
> one
> > or more fields.  Each of these fields holds a vector of values with the
> > goal that each batch fits within a single core's L2 cache.  The
> VectorValue
> > structures are envisioned to be language agnostic and are backed by
> > Netty4's ByteBuf abstraction.  These Vector formats will be strongly
> > documented and not java centric so that moving back and forth between the
> > native layer is reasonable.  The thinking is that there will be two
> > additional direct compression interfaces for RLE and Dict for specialized
> > operators who don't need fully decompressed data.  This provides a
> > compromise between excess overhead due to compression-aware operators and
> > losing out on any compression-aware benefits.  As you can see,
> ValueVectors
> > include Required (subclasses of FixedValueVector and VariableVector),
> > nullable (a.k.a optional) and I'll be adding a Dremel-esque nested
> repeated
> > value set of vectors.
> >
> > 2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
> > User2Bit communication. The key being that these are a push/pull combined
> > interface to allow streaming responses and also allow direct transfer of
> > ByteBuf's without serialization and deserialization or excessive copies.
> > (And JNI interchange with minimal overhead.)
> >
> > 3) As mentioned previously on the list, the initial ClusterCoordinator is