-Re: resync: todo list
Jason Altekruse 2013-06-18, 15:57
My name is Jason Altekruse, I recently joined Jacques at MapR and have been
working on one of the tasks you mentioned. I have implemented a basic
optimizer that does a simple transformation from a logical to physical
plan. With it we were able to run a the mock scan operation though the full
system. Jacques will provide a better overview of the status today at the
hangout, but Ben Becker is working on the implementations of the
ValueVectors, which are needed to implement the operators in the full
execution engine. Once there are more operators available, we can extend my
basic optimizer to include them.
Right now I am hooking up Julian Hyde's code, which connects Drill to
sqlline, with the full execution engine. I believe that his work might
cover your above point about test cases for sql queries.
On Tue, Jun 18, 2013 at 4:04 AM, Lisen Mu <[EMAIL PROTECTED]> wrote:
> Hi drillers,
> I'm back online.
> I'm going to continue on my goal: execute query on one drillbit first.
> I've pulled from current github/execwork. It seems to me that the following
> work remain to be done, correct me if anything wrong:
> * test cases for sql query.
> this would include join, projection, selection, grouping.
> * nextBatch() for PhysicalOperator
> which does the iteration over records.
> * encoded ValueVector types
> dictionary encoding/bit vector encoding/RLE, for strings especially, to
> reduce memory usage.
> * POP implementation for Join/Projection/Selection etc.
> most importantly, with the nextBatch() method. And, how would these POP
> cooperate with different ValueVector types, especially encoded types?
> anyway, I could start with simple cases first.
> * Foreman.convert()
> We have Optimizer interface to do this. Optimizer should generate physical
> plan with ExchangeOps, which are the boundary of fragments. What's the rule
> of generating Exchange nodes? How will clustering/schema information affect
> this? anyway, I could start with simple case too: no exchange at all.
> And further todo:
> * performance test suites
> I think we need some bigger data set, best in json file and in HTable.
> shall I include test data file in source repository, or shall I generate
> (predictable) data set each time at test setup? Which approach do you
> Currently, I'm willing to contribute to any of above. If anything is wrong
> or anything is already done, please let me know. From tomorrow on, I could
> lay out these issues on jira and start working.