-Re: max() returns NULL on partition column
Robin Morris 2013-04-22, 22:54
Thanks for confirming that I wasn't imagining this. Turns out this was a
known bug (HIVE-2955 https://issues.apache.org/jira/browse/HIVE-2955 ) and
is already fixed in version 0.10.0
On 4/22/13 11:25 AM, "Mike Liddell" <[EMAIL PROTECTED]> wrote:
>I was interested so tried a quick repro. Yes I see the behavior too.
>Environment: hive-0.9.0 on windows.
>Hive> CREATE TABLE ptable(val int) PARTITIONED BY(year int, month int,
>Hive> LOAD DATA LOCAL INPATH d:/data/2013-01-01.data.txt INTO TABLE
>// the txt file has separate lines with values 1,2,3,4.
>Hive> select max(day) from ptable;
>Hive> select avg(day) from ptable;
>Other normal queries work fine. I didn't try other aggregations.
>From: Robin Morris [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, April 18, 2013 1:15 PM
>To: [EMAIL PROTECTED]
>Subject: max() returns NULL on partition column
>partitioned_table is partitioned on year, month, day.
>A query of the form
>> select max(day) from partitioned_table where year=2013 and month=4;
>spins up zero mappers, one reducer, and returns NULL
>Hadoop job information for Stage-1: number of mappers: 0; number of
>2013-04-18 12:35:57,842 Stage-1 map = 0%, reduce = 0%
>2013-04-18 12:36:00,855 Stage-1 map = 0%, reduce = 100%, Cumulative CPU
>2013-04-18 12:36:01,860 Stage-1 map = 0%, reduce = 100%, Cumulative CPU
>2013-04-18 12:36:02,865 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
>MapReduce Total cumulative CPU time: 1 seconds 360 msec Ended Job >job_201302011159_205031 MapReduce Jobs Launched:
>Job 0: Reduce: 1 Accumulative CPU: 1.36 sec HDFS Read: 0 HDFS Write: 3
>Total MapReduce CPU Time Spent: 1 seconds 360 msec OK NULL Time taken:
>Trying to work round it by doing something like select max(day) from
>(select day from partitioned_table where year=2013 and month=4)foo;
>doesn't work either, I'm guessing because the optimizer pushes down the
>max() into the subquery, resulting in the same query as above.
>Using an explicit intermediate table does work
>> create table foo_max as select day from partitioned_table where
>>year=2013 and month=4; select max(day) from foo_max; drop table
>Several map-reduce jobs later, the correct answer (of 18) is given.
>distinct() has similar behavior, except that rather than returning NULL,
>it returns nothing.
>min() returns NULL
>If someone else can reproduce this, I'll submit a bug.