Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
On Fri, May 25, 2012 at 12:03 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
>
> -- see what happens when you try to perform multi-group-by query on one of
> the partitions
> EXPLAIN EXTENDED
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
>     SELECT exp_col1
>     WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
>     SELECT exp_col1
>     WHERE (part_col=2);
> -- result: it wants to scan all partitions :-(
>

Since nobody else did, let me answer myself... In the end I found out that
the correct partition pruning can be achieved using subquery. Continuing
the example from my last post, the query would be:

FROM (
    SELECT * FROM partition_test
    LATERAL VIEW explode(col1) tmp AS exp_col1
    WHERE part_col=2
) t
INSERT OVERWRITE DIRECTORY '/test/1'
    SELECT exp_col1
INSERT OVERWRITE DIRECTORY '/test/2'
    SELECT exp_col1;

I still think the pruning should work correctly no matter how the query is
written, but for now I'm happy with this solution.

J. Dolinar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB