Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill >> mail # dev >> timeline for dist execution


+
David Alves 2013-04-05, 22:31
+
Michael Hausenblas 2013-04-05, 22:36
+
Lisen Mu 2013-04-07, 03:39
+
Jacques Nadeau 2013-04-07, 04:36
+
Timothy Chen 2013-04-07, 05:45
+
Lisen Mu 2013-04-07, 08:27
+
David Alves 2013-04-08, 01:01
+
Timothy Chen 2013-04-08, 05:49
+
Jacques Nadeau 2013-04-08, 16:01
+
Timothy Chen 2013-04-08, 18:13
+
David Alves 2013-04-08, 21:14
+
Ted Dunning 2013-04-08, 21:17
+
Jacques Nadeau 2013-04-12, 17:24
+
David Alves 2013-04-12, 19:56
Copy link to this message
-
Re: timeline for dist execution
You can check out some of what I've been working on my GitHub at
https://github.com/jacques-n/incubator-drill/tree/execwork

Key concepts are:
1) The primary in-memory data structure is a RecordBatch that contains one
or more fields.  Each of these fields holds a vector of values with the
goal that each batch fits within a single core's L2 cache.  The VectorValue
structures are envisioned to be language agnostic and are backed by
Netty4's ByteBuf abstraction.  These Vector formats will be strongly
documented and not java centric so that moving back and forth between the
native layer is reasonable.  The thinking is that there will be two
additional direct compression interfaces for RLE and Dict for specialized
operators who don't need fully decompressed data.  This provides a
compromise between excess overhead due to compression-aware operators and
losing out on any compression-aware benefits.  As you can see, ValueVectors
include Required (subclasses of FixedValueVector and VariableVector),
nullable (a.k.a optional) and I'll be adding a Dremel-esque nested repeated
value set of vectors.

2) The ByteBuf interface is also used for the protobuf based Bit2Bit and
User2Bit communication. The key being that these are a push/pull combined
interface to allow streaming responses and also allow direct transfer of
ByteBuf's without serialization and deserialization or excessive copies.
 (And JNI interchange with minimal overhead.)

3) As mentioned previously on the list, the initial ClusterCoordinator is
utilizing Zk/Curator.  I've also added a quick integration with Hazelcast
to manage things like the per-node queue depth for distributed scheduling
purposes.  This may be a bit heavy but should get us to functional faster.

This is heavily WIP so many things are staged but not connected yet.
 Things are broken.  And there are no tests.  But hopefully it will give
you a sense of the direction I've been headed.

I'm hoping to add some more things to this over the weekend and then we can
go through things on Tuesday.

Thanks,
Jacques

On Fri, Apr 12, 2013 at 12:56 PM, David Alves <[EMAIL PROTECTED]> wrote:

> Hi Jacques
>
>         sounds good!
>         will you still be able to post a link to your wip dist exec stuff
> before the weekend?
>         really anxious to tinker with it.
>
> Best
> David
>
> On Apr 12, 2013, at 12:24 PM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:
>
> > Looks like most people can meet at 9am PST on Tuesday.   Let's meet then.
> >
> > J
> >
> > On Mon, Apr 8, 2013 at 2:17 PM, Ted Dunning <[EMAIL PROTECTED]>
> wrote:
> >
> >> Great idea.
> >>
> >>
> >>
> >> On Mon, Apr 8, 2013 at 2:14 PM, David Alves <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> Hi All
> >>>
> >>>        I took the liberty of creating a doodle for the hangout to
> >>> (hopefully) make it easier to select a time suitable for everyone.
> >>>        The link is: http://www.doodle.com/t9b5n455utkpebi3
> >>>
> >>> Best
> >>> David Alves
> >>>
> >>> On Apr 8, 2013, at 1:13 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> I'm available anytime after 1:30 pm PST M/W, and 1-4 pm PST F.
> >>>>
> >>>> Tim
> >>>>
> >>>>
> >>>> On Mon, Apr 8, 2013 at 9:01 AM, Jacques Nadeau <[EMAIL PROTECTED]>
> >>> wrote:
> >>>>
> >>>>> Given David's request to have everybody review whatever I share,
> let's
> >>> do
> >>>>> M/T/W of next week..  What times are people available?
> >>>>>
> >>>>> J
> >>>>>
> >>>>> On Sun, Apr 7, 2013 at 10:49 PM, Timothy Chen <[EMAIL PROTECTED]>
> >>> wrote:
> >>>>>
> >>>>>> I'm open 2pm pst, see when Jacques is open.
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>> Sent from my iPad
> >>>>>>
> >>>>>> On Apr 7, 2013, at 6:01 PM, David Alves <[EMAIL PROTECTED]>
> >> wrote:
> >>>>>>
> >>>>>>> Hi Jacques
> >>>>>>>
> >>>>>>>> I'll try to drop some of my work and thoughts on the list this
> >> week.
> >>>>>>>
> >>>>>>>  That is great news!
> >>>>>>>
> >>>>>>>> As always with these things, everything takes longer than one
> would
> >>>>>>
+
David Alves 2013-04-13, 23:28
+
Jacques Nadeau 2013-04-14, 03:07
+
Timothy Chen 2013-04-14, 06:30
+
Lisen Mu 2013-04-15, 10:22
+
Lisen Mu 2013-04-15, 10:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB