-Re: Canonical examples
Julian Hyde 2013-09-11, 21:13
The FoodMart data set, which I use for Mondrian testing, is worth considering. It is organized as a star schema, and it is generated, but unlike TPC-DS the names are human-readable. It has been the main demo data set for my Mondrian OLAP engine for several years.
The data set is fairly small (about 5MB compressed, 26 tables, 250k fact rows, largest dimension 10k rows) but someone could write a program to scale it up.
It is already in the drill code base (via the dependency on mondrian-data-foodmart-json in sqlparser/pom.xml).
On Sep 10, 2013, at 12:58 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> TPC-DS is much more typical of where Drill is going to find itself.
> As an intro, simpler might be better (I think that may be part of your
> point). But realism is good as well.
> On Tue, Sep 10, 2013 at 11:22 AM, Ben Becker <[EMAIL PROTECTED]> wrote:
>> TPC-DS is modeled after a retail product supplier. I'm hesitant to use the
>> actual TPC-DS schemata though, since it's not optimized for concise
>> explanation (e.g. hungarian notation-esque column names).
>> I think the donut shop metaphor could be adapted to a similar set of
>> queries (they are both retail product suppliers, after all). There are
>> still some subtle nuances though; for example, it may not make as much
>> sense for a donut shop to accept returns. :)
>> How about a hardware store (perhaps one that sells lots of drills)?
>> On Tue, Sep 10, 2013 at 10:37 AM, Ted Dunning <[EMAIL PROTECTED]>
>>> I think that the donut examples are too small and limited to be
>>> beyond the issues presented by nested data.
>>> The TPC derived data like nations and regions provides a bunch of other
>>> examples such as grouping, joins and correlated sub-queries. Creating
>>> additional examples when we already have what we need doesn't seem all
>>> important. I can be proved wrong if somebody instantly comes up with
>>> examples that are donut themed and provide all of the necessary examples.
>>> It should also be noted that we are going to wind up with examples from
>>> TPC-DS which a realistic snowflake schema and volume. Restating all of
>>> those examples in terms of donuts is a whole lot more work than restating
>>> two files. So we are eventually going to have to give up on donut
>>> Why not now?
>>> On Tue, Sep 10, 2013 at 9:52 AM, Ben Becker <[EMAIL PROTECTED]>
>>>> Hi All,
>>>> As we continue to update Drill's user documentation, I think it's
>>>> that everything is consistent and cohesive. One way to move toward
>>>> goal is to use a single metaphor in our examples. In the existing
>>>> documentation, this has primarily been a donut store.
>>>> Are there any objections or concerns about continuing with this
>>>> Please feel free to propose a new metaphor if you think it will help
>>>> Note that this is not meant to be a binding decision, and does not
>>>> other types of examples when appropriate.