Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
On 6/7/12, Mark Grover <[EMAIL PROTECTED]> wrote:
> Can you please check if predicate push down enabled changes the explain
> plan on a simple inner join query like:
>
> select a.* from a inner join b on(a.key=b.key) where a.some_col=blah;

No problem, I ran following as you suggested (INNER JOIN didn't work
for me, so I used just JOIN):

create table a (key int, some_col string);
create table b (key int, some_col string);

set hive.optimize.ppd=true;
explain select a.* from a join b on(a.key=b.key) where a.some_col='blah';

set hive.optimize.ppd=false;
explain select a.* from a join b on(a.key=b.key) where a.some_col='blah';

There is a difference in the explains, the first one has a Filter
operator on some_col, quite high in the tree. So I guess here it is
working, although I still see another Filter operator in reduce deeper
down in both, I'm not sure if that is correct or not, but I believe
that it should be only executed once. I put the results at pastebin so
you can see yourself: http://pastebin.com/gquMksqE and
http://pastebin.com/0FPx7KKG.

Jan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB