Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Re: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries


Copy link to this message
-
Re: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries
Hi All,

I hope, I didn't kept you waiting on this for long.

Downloaded:
http://www.tpc.org/tpch/spec/tpch_2_16_0.zip

Setup VC++ project, as mentioned in last before Tue standup.
It is not required that I found later.  It is for generating the data and sqls.
But the above ZIP already have data and sqls bundled.
That VC++ project didn't compile well on my machine, anyway.

1.
I created the pojo's for the schema (page 13) defined in tpch.2.16.0.pdf,
as a separate simple java project.

Then, used super-csv and gson libraries to convert the data files in psv (pipe separated value),
into JSON files.

Now, data is in json and schema as pojo's.
2.

In query-parser project, in DrqlParserTest.java, I added on test method,
testTPCHSql1() to run the query parser for Sql1 (out of 20+5 sqls from TPC-H).

Copied the entire method to the end of this e-mail for your reference.
Along with info gather debugging and console output.
The test method printed an error on the console along with output that I am printing.
Is error valid ? does it needs to be fixed ? should I create another JIRA for it ?
Is the output good enough result of parsing a SQL ?  Should I go ahead with rest of the SQLs ?
[Sree Vaddi:] Seems, I should be using 'sqlparser' project.  Any sample/thought ?
3.
How to apply the parsed sql from 2. above to the data in 1. above, to output the
Logical Plan ?
Please advise.
Thanking you.
With Regards
Sree

Supporting code for 2. above and debug info:

    @Test
    public void testTPCHSql1() {
        String drqlQueryText = "select " +
            "l_returnflag, l_linestatus, " +
            "sum(l_quantity) as sum_qty, " +
            "sum(l_extendedprice) as sum_base_price, " +
            "sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, " +
            "sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, " +
            "avg(l_quantity) as avg_qty, " +
       
     "avg(l_extendedprice) as avg_price, " +
            "avg(l_discount) as avg_disc, " +
            "count(*) as count_order " +
        "from " +
            "lineitem " +
        "where " +
            "l_shipdate <= date '1998-12-01' - interval ':1' day (3) " +
        "group by " +
            "l_returnflag, " +
            "l_linestatus " +
        "order by " +
            "l_returnflag, " +
           
 "l_linestatus;";
       
        DrqlParser parser = new AntlrParser();
        SemanticModelReader query = parser.parse(drqlQueryText);
       
        System.out.println(query.getFromClause());
        System.out.println(query.getGroupByClause());
        System.out.println(query.getJoinOnClause());
        System.out.println(query.getjustATable());
        System.out.println(query.getLimitClause());
        System.out.println(query.getOrderByClause());
        System.out.println(query.getResultColumnList().size());
       
 System.out.println(query.getWhereClause());
        /*
setup debug info:
line#2299 DrqlAntlrParser
2320
3682
4884
5363

392
6664

#1207 DrqlAntlrLexer.mDiv()
part of the sql parsing:
// l_shipdate <= date '1998-12-01' - interval ':1' day (3)
variable value: (parsing location in sql i.e the location of letter 'd' in date)
[@125,378:379='<=',<52>,1:378]

looks like the 'date' is interpreted as 'div' ?!

test method console output:
line 1:382 mismatched character 'A' expecting ' '
line 1:416 mismatched character 'A' expecting ' '

[org.apache.drill.parsers.impl.drqlantlr.SemanticModel@3a86edfe]
[]
null
null
null
[]
10
null

         */
    }

________________________________
 From: Sree Vaddi (JIRA) <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, July 25, 2013 7:07 AM
Subject: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries
 
     [ https://issues.apache.org/jira/browse/DRILL-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on DRILL-47 started by Sree Vaddi.

> Generate Logical Plans for TPC-H Queries
> ------------------------
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB