|
|
-
Bugs exist in SEMI JOIN?Felix.徐 2012-11-21, 08:52
Hi,
I am using the version 0.9.0 and my tables are the same with TPC-H benchmark: Here is a simple query(works correctly): *Q1* INSERT OVERWRITE TABLE customer_orders_statistics SELECT C_CUSTKEY FROM CUSTOMER LEFT SEMI JOIN( SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY it means inserting the key of customers who has orders since 1995-12-31 into another table. But if I write the query like this: *Q2* INSERT OVERWRITE TABLE customer_orders_statistics SELECT C_CUSTKEY FROM CUSTOMER LEFT SEMI JOIN ORDERS ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY *AND *unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') I will get exception from Hive: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Also,If I write the query like this: *Q3* INSERT OVERWRITE TABLE customer_orders_statistics SELECT C_CUSTKEY FROM CUSTOMER LEFT SEMI JOIN ORDERS ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY *WHERE *unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') Then this query can be executed(wondering the right hand of SEMI JOIN can be referenced in WHERE clause now?), but the result is wrong(comparing to *Q1, Q1*'s result is the same with mysql). |