Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> resync: todo list


Copy link to this message
-
Re: resync: todo list
Jason,

It's great to see more committers on board!

It's also great to hear you are on this. If I understand correctly in the
hangout, Jacques would soon provide basic example of how POP worked with
value vector.

Really looking forward to your next commit.

Thanks
On Tue, Jun 18, 2013 at 11:57 PM, Jason Altekruse
<[EMAIL PROTECTED]>wrote:

> Hi Lisen,
>
> My name is Jason Altekruse, I recently joined Jacques at MapR and have been
> working on one of the tasks you mentioned. I have implemented a basic
> optimizer that does a simple transformation from a logical to physical
> plan. With it we were able to run a the mock scan operation though the full
> system. Jacques will provide a better overview of the status today at the
> hangout, but Ben Becker is working on the implementations of the
> ValueVectors, which are needed to implement the operators in the full
> execution engine. Once there are more operators available, we can extend my
> basic optimizer to include them.
>
> Right now I am hooking up Julian Hyde's code, which connects Drill to
> sqlline, with the full execution engine. I believe that his work might
> cover your above point about test cases for sql queries.
>
> - Jason
>
>
> On Tue, Jun 18, 2013 at 4:04 AM, Lisen Mu <[EMAIL PROTECTED]> wrote:
>
> > Hi drillers,
> >
> > I'm back online.
> > I'm going to continue on my goal: execute query on one drillbit first.
> >
> > I've pulled from current github/execwork. It seems to me that the
> following
> > work remain to be done, correct me if anything wrong:
> >
> > * test cases for sql query.
> > this would include join, projection, selection, grouping.
> >
> > * nextBatch() for PhysicalOperator
> > which does the iteration over records.
> >
> > * encoded ValueVector types
> > dictionary encoding/bit vector encoding/RLE, for strings especially, to
> > reduce memory usage.
> >
> > * POP implementation for Join/Projection/Selection etc.
> > most importantly, with the nextBatch() method. And, how would these POP
> > cooperate with different ValueVector types, especially encoded types?
> > anyway, I could start with simple cases first.
> >
> > * Foreman.convert()
> > We have Optimizer interface to do this. Optimizer should generate
> physical
> > plan with ExchangeOps, which are the boundary of fragments. What's the
> rule
> > of generating Exchange nodes? How will clustering/schema information
> affect
> > this? anyway, I could start with simple case too: no exchange at all.
> >
> > And further todo:
> >
> > * performance test suites
> > I think we need some bigger data set, best in json file and in HTable.
> > shall I include test data file in source repository, or shall I generate
> > (predictable) data set each time at test setup? Which approach do you
> > prefer?
> >
> >
> > Currently, I'm willing to contribute to any of above. If anything is
> wrong
> > or anything is already done, please let me know. From tomorrow on, I
> could
> > lay out these issues on jira and start working.
> >
> > Thanks,
> >
> > Lisen
>