Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Optiq, LucidDB, DynamoBI, Eigenbase and Saffron sitting in a tree...


Copy link to this message
-
Re: Optiq, LucidDB, DynamoBI, Eigenbase and Saffron sitting in a tree...
Julian Hyde 2012-12-04, 08:12
On Nov 29, 2012, at 11:24 AM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> This is probably primarily for Julian but anybody else who knows the
> history...
>
> It seems like Saffron was absorbed by Eigenbase which became
> DynamoBI/LucidDB.  Around the same time, Optiq forked some of the Eigenbase
> code for a separate purpose.  I'd love a short history lesson on their
> relationship and provenance. If we rely more on the Eigenbase code than the
> Optiq code, is Optiq the latest fork of that code?

Jacques,

Every family is always delighted to be asked about their history. Thanks for asking!

You pretty much have it right. Eigenbase was/is a standards-compliant framework for building data management systems, and was the basis for SQLstream and LucidDB, both of which proved the technology in production with many customers. The code came from me (Saffron), John Sichi (physical layer), Broadbase code for bitmap indexes & column store (acquired & donated to the project by LucidEra), SQLstream and LucidEra employees, and other contributors. All contributors signed the appropriate contribution agreements.

Although Eigenbase was successful in these two products, it didn't get as much adoption as a framework as we'd hoped. I think the problem was due to license (then GPL) and complexity (Eigenbase is quite a lot of code, because it implements a lot of SQL functionality, and is hybrid C++ and Java).

LucidDB last year changed to Apache license, and I set about solving the other problem by whittling down LucidDB from a full database to a query planning engine in pure Java, with SQL parser & validator as optional extras. My rationale is that there are plenty of storage engines and runtimes out there (Drill is another welcome addition) but it is difficult to build the full stack from scratch, and a lot of projects stop short of the full SQL+optimizer support that most of their users want. (Hence all these NoSQL systems that spent their first few years of life claiming that lack of SQL support was a virtue.)

Optiq has a much smaller surface area than LucidDB; and as a framework, that surface area is a Java API (with some SPIs), whereas LucidDB's surface area was both SQL and Java.

> Other helpful information would be:
> What is the vintage of the various components?  For example, I see pre Java
> 1.5 approaches various places and more modern approaches elsewhere.

The "pre Java 1.5" approaches, you're probably referring to the org.eigenbase.util14 package, and the Enum14 class that emulates enums before there were added to the language in JDK 1.5. We tried to stay compatible with as many JDKs as possible, and we kept support for JDK 1.4 clients for about a year after we dropped server support.

That package could do with a clean out, which I can do now that Optiq doesn't have a user base. There's a lot of code I'd like to remove, such as the dependency on the openjava Java object model. I'm keeping it out of Optiq's public API, and I will remove it in due course.

> What SQL standard was the parser built for?

SQL:2003 (features such as window functions and MERGE; partial implementation of SQL XML functions). SQL:2008 and SQL:2011 have come since, but don't have major changes.

> What is the status of the various components?  Which are WIP and which are
> "solid".  For example I saw the following in
> org.eigenbase.relopt.volcano.VolcanoPlanner
>
>    /**
>     * If true, the planner keeps applying rules as long as they continue to
>     * reduce the cost. If false, the planner terminates as soon as it has
> found
>     * any implementation, no matter how expensive. The default is false
> due to
>     * unresolved bugs with various rules.
>     */
>    protected boolean ambitious = true;

Most of the components are very solid, VolcanoPlanner included. That comment is the kind of thing you find in most non-trivial production code. VolcanoPlanner is still evolving, and evolving code often has a few experimental features inside it.

By the way, it wouldn't be out of the question to swap out VolcanoPlanner, the way that Linux seems to swap out its scheduler every couple of years. As evidence of that, there is an alternative planner implementation, called HepPlanner, that can use the same rules, and has also been used in production.

John Sichi, the project lead, was conservative in labeling the stability of components [ see http://eigenbase.wikispaces.com/ComponentInventory ]. A component's API was only labeled "solid" if it could not conceivably change in future. Even those components labelled "experimental" were high quality in my experience.

The latest stage in the evolution is to simplify. I intend to keep the pieces that are useful to a few representative pilot projects that use the framework (one of which, I hope, will be Drill). That means I will be removing a lot of that code.

Now my time is freeing up a little, I thought I could prototype an integration between Optiq and Drill. Although the ideal configuration would be for Drill to embed Optiq -- Drill would parse DrQL, pass to Optiq to optimize, then execute the resulting plan -- a simpler configuration for demonstration purpose would be for Optiq to parse SQL and pass that tree to rules that generate Drill plans. If you like that idea, we should choose one or two Drill plans, and I will write the rules to generate them. Let me know whether you like that approach.

Julian