-Re: resync: todo list
Lisen Mu 2013-06-18, 16:41
It's great to see more committers on board!
It's also great to hear you are on this. If I understand correctly in the
hangout, Jacques would soon provide basic example of how POP worked with
Really looking forward to your next commit.
On Tue, Jun 18, 2013 at 11:57 PM, Jason Altekruse
> Hi Lisen,
> My name is Jason Altekruse, I recently joined Jacques at MapR and have been
> working on one of the tasks you mentioned. I have implemented a basic
> optimizer that does a simple transformation from a logical to physical
> plan. With it we were able to run a the mock scan operation though the full
> system. Jacques will provide a better overview of the status today at the
> hangout, but Ben Becker is working on the implementations of the
> ValueVectors, which are needed to implement the operators in the full
> execution engine. Once there are more operators available, we can extend my
> basic optimizer to include them.
> Right now I am hooking up Julian Hyde's code, which connects Drill to
> sqlline, with the full execution engine. I believe that his work might
> cover your above point about test cases for sql queries.
> - Jason
> On Tue, Jun 18, 2013 at 4:04 AM, Lisen Mu <[EMAIL PROTECTED]> wrote:
> > Hi drillers,
> > I'm back online.
> > I'm going to continue on my goal: execute query on one drillbit first.
> > I've pulled from current github/execwork. It seems to me that the
> > work remain to be done, correct me if anything wrong:
> > * test cases for sql query.
> > this would include join, projection, selection, grouping.
> > * nextBatch() for PhysicalOperator
> > which does the iteration over records.
> > * encoded ValueVector types
> > dictionary encoding/bit vector encoding/RLE, for strings especially, to
> > reduce memory usage.
> > * POP implementation for Join/Projection/Selection etc.
> > most importantly, with the nextBatch() method. And, how would these POP
> > cooperate with different ValueVector types, especially encoded types?
> > anyway, I could start with simple cases first.
> > * Foreman.convert()
> > We have Optimizer interface to do this. Optimizer should generate
> > plan with ExchangeOps, which are the boundary of fragments. What's the
> > of generating Exchange nodes? How will clustering/schema information
> > this? anyway, I could start with simple case too: no exchange at all.
> > And further todo:
> > * performance test suites
> > I think we need some bigger data set, best in json file and in HTable.
> > shall I include test data file in source repository, or shall I generate
> > (predictable) data set each time at test setup? Which approach do you
> > prefer?
> > Currently, I'm willing to contribute to any of above. If anything is
> > or anything is already done, please let me know. From tomorrow on, I
> > lay out these issues on jira and start working.
> > Thanks,
> > Lisen