Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
Mark Grover 2012-06-04, 17:20
Hi Jan,
Glad you found something workable.

What version of Hive are you using? Could you also please check what the value of the property hive.optimize.ppd is for you?

Thanks,
Mark

----- Original Message -----
From: "Jan Dolinár" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, May 29, 2012 1:57:25 AM
Subject: Re: Multi-group-by select always scans entire table
On Fri, May 25, 2012 at 12:03 PM, Jan Dolinár < [EMAIL PROTECTED] > wrote:
-- see what happens when you try to perform multi-group-by query on one of the partitions
EXPLAIN EXTENDED
FROM partition_test
LATERAL VIEW explode(col1) tmp AS exp_col1
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
WHERE (part_col=2)
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT exp_col1
WHERE (part_col=2);
-- result: it wants to scan all partitions :-(
Since nobody else did, let me answer myself... In the end I found out that the correct partition pruning can be achieved using subquery. Continuing the example from my last post, the query would be:
FROM (
SELECT * FROM partition_test
LATERAL VIEW explode(col1) tmp AS exp_col1
WHERE part_col=2
) t
INSERT OVERWRITE DIRECTORY '/test/1'
SELECT exp_col1
INSERT OVERWRITE DIRECTORY '/test/2'
SELECT exp_col1;
I still think the pruning should work correctly no matter how the query is written, but for now I'm happy with this solution.
J. Dolinar