-Empty Query Results
Peter Marron 2013-08-06, 12:48
I'm running Hive 0.11.0 over Hadoop 1.0.4.
I have a problem with some queries returning empty results.
I have a UDF called rownumber which is deterministic.
Specifically I have two tables, championsvalues and championsrows.
They both have custom imputformats which effect filtering by examining the filter pushed down.
(Looking for predicates using the rownumber UDF.)
The table championsvalues is partitioned whilst the table championsrows is not.
I get the following behaviour:
13/08/06 13:38:47 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
Logging initialized using configuration in jar:file:/home/pmarron/hive-0.11.0/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/pmarron/[EMAIL PROTECTED]
> select *,rownumber() from championsvalues where aid=6 and rownumber() between 4 and 7;
Time taken: 1.417 seconds
> select *,rownumber() from championsrows where rownumber() between 7 and 9;
1996 Juventus Ajax 1-1 (4-2 pens) Rome 67000 8
1995 Ajax AC Milan 1-0 Vienna 49730 9
1994 AC MIlan Barcelona 4-0 Athens 70000 10
Time taken: 0.141 seconds, Fetched: 3 row(s)
> select *,rownumber() from championsvalues where aid=6 and rownumber(away) between 4 and 7;
1982 Aston Villa Bayern Munich 1-0 Rotterdam 46000 6 5
1998 Real Madrid Juventus 1-0 Amsterdam 47500 6 6
1981 Liverpool Real Madrid 1-0 Paris 48360 6 7
Time taken: 0.167 seconds, Fetched: 3 row(s)
In other words a query over the partitioned table returns nothing unless I pass an argument
to the UDF. I assume that this is an optimization because it evaluates the UDF once, outside
of the query and determines that no rows will be returned. That would explain why, when
I invoke the UDF with an argument it can't make this optimization and so ends up returning results.
What I can't understand is, if this is what's really going on, then why isn't the query over the
un-partitioned table pruned in the same way?
Is this to do with query optimization or is there something else going on?
I have attached a log of the same queries being run with debugging switched on.
This is driving me mad, so any help appreciated.