Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Re: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries


Copy link to this message
-
Re: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries
Sree V 2013-07-25, 14:13
Hi All,

I hope, I didn't kept you waiting on this for long.

Downloaded:
http://www.tpc.org/tpch/spec/tpch_2_16_0.zip

Setup VC++ project, as mentioned in last before Tue standup.
It is not required that I found later.  It is for generating the data and sqls.
But the above ZIP already have data and sqls bundled.
That VC++ project didn't compile well on my machine, anyway.

1.
I created the pojo's for the schema (page 13) defined in tpch.2.16.0.pdf,
as a separate simple java project.

Then, used super-csv and gson libraries to convert the data files in psv (pipe separated value),
into JSON files.

Now, data is in json and schema as pojo's.
2.

In query-parser project, in DrqlParserTest.java, I added on test method,
testTPCHSql1() to run the query parser for Sql1 (out of 20+5 sqls from TPC-H).

Copied the entire method to the end of this e-mail for your reference.
Along with info gather debugging and console output.
The test method printed an error on the console along with output that I am printing.
Is error valid ? does it needs to be fixed ? should I create another JIRA for it ?
Is the output good enough result of parsing a SQL ?  Should I go ahead with rest of the SQLs ?
[Sree Vaddi:] Seems, I should be using 'sqlparser' project.  Any sample/thought ?
3.
How to apply the parsed sql from 2. above to the data in 1. above, to output the
Logical Plan ?
Please advise.
Thanking you.
With Regards
Sree

Supporting code for 2. above and debug info:

    @Test
    public void testTPCHSql1() {
        String drqlQueryText = "select " +
            "l_returnflag, l_linestatus, " +
            "sum(l_quantity) as sum_qty, " +
            "sum(l_extendedprice) as sum_base_price, " +
            "sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, " +
            "sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, " +
            "avg(l_quantity) as avg_qty, " +
       
     "avg(l_extendedprice) as avg_price, " +
            "avg(l_discount) as avg_disc, " +
            "count(*) as count_order " +
        "from " +
            "lineitem " +
        "where " +
            "l_shipdate <= date '1998-12-01' - interval ':1' day (3) " +
        "group by " +
            "l_returnflag, " +
            "l_linestatus " +
        "order by " +
            "l_returnflag, " +
           
 "l_linestatus;";
       
        DrqlParser parser = new AntlrParser();
        SemanticModelReader query = parser.parse(drqlQueryText);
       
        System.out.println(query.getFromClause());
        System.out.println(query.getGroupByClause());
        System.out.println(query.getJoinOnClause());
        System.out.println(query.getjustATable());
        System.out.println(query.getLimitClause());
        System.out.println(query.getOrderByClause());
        System.out.println(query.getResultColumnList().size());
       
 System.out.println(query.getWhereClause());
        /*
setup debug info:
line#2299 DrqlAntlrParser
2320
3682
4884
5363

392
6664

#1207 DrqlAntlrLexer.mDiv()
part of the sql parsing:
// l_shipdate <= date '1998-12-01' - interval ':1' day (3)
variable value: (parsing location in sql i.e the location of letter 'd' in date)
[@125,378:379='<=',<52>,1:378]

looks like the 'date' is interpreted as 'div' ?!

test method console output:
line 1:382 mismatched character 'A' expecting ' '
line 1:416 mismatched character 'A' expecting ' '

[org.apache.drill.parsers.impl.drqlantlr.SemanticModel@3a86edfe]
[]
null
null
null
[]
10
null

         */
    }

________________________________
 From: Sree Vaddi (JIRA) <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, July 25, 2013 7:07 AM
Subject: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries
 
     [ https://issues.apache.org/jira/browse/DRILL-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on DRILL-47 started by Sree Vaddi.

> Generate Logical Plans for TPC-H Queries
> ------------------------
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira