Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Empty Query Results


Copy link to this message
-
Empty Query Results
Hi,

I'm running Hive 0.11.0 over Hadoop 1.0.4.
I have a problem with some queries returning empty results.
I have a UDF called rownumber which is deterministic.
Specifically I have two tables, championsvalues and championsrows.
They both have custom imputformats which effect filtering by examining the filter pushed down.
(Looking for predicates using the rownumber UDF.)
The table championsvalues is partitioned whilst the table championsrows is not.
I get the following behaviour:

-bash-4.1$ hive
13/08/06 13:38:47 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

Logging initialized using configuration in jar:file:/home/pmarron/hive-0.11.0/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/pmarron/[EMAIL PROTECTED]
hive>
    > select *,rownumber() from championsvalues where aid=6 and rownumber() between 4 and 7;
OK
Time taken: 1.417 seconds
hive>
    > select *,rownumber() from championsrows where rownumber() between 7 and 9;
OK
1996    Juventus        Ajax    1-1 (4-2 pens)  Rome    67000   8
1995    Ajax    AC Milan        1-0     Vienna  49730   9
1994    AC MIlan        Barcelona       4-0     Athens  70000   10
Time taken: 0.141 seconds, Fetched: 3 row(s)
hive>
    > select *,rownumber() from championsvalues where aid=6 and rownumber(away) between 4 and 7;
OK
1982    Aston Villa     Bayern Munich   1-0     Rotterdam       46000   6       5
1998    Real Madrid     Juventus        1-0     Amsterdam       47500   6       6
1981    Liverpool       Real Madrid     1-0     Paris   48360   6       7
Time taken: 0.167 seconds, Fetched: 3 row(s)
hive>
    >
    > quit;

In other words a query over the partitioned table returns nothing unless I pass an argument
to the UDF. I assume that this is an optimization because it evaluates the UDF once, outside
of the query and determines that no rows will be returned. That would explain why, when
I invoke the UDF with an argument it can't make this optimization and so ends up returning results.
What I can't understand is, if this is what's really going on, then why isn't the query over the
un-partitioned table pruned in the same way?

Is this to do with query optimization or is there something else going on?
I have attached a log of the same queries being run with debugging switched on.

This is driving me mad, so any help appreciated.

Z