Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
Jan Dolinár 2012-06-07, 14:03
On 6/7/12, Mark Grover <[EMAIL PROTECTED]> wrote:
> Can you please check if predicate push down enabled changes the explain
> plan on a simple inner join query like:
>
> select a.* from a inner join b on(a.key=b.key) where a.some_col=blah;

No problem, I ran following as you suggested (INNER JOIN didn't work
for me, so I used just JOIN):

create table a (key int, some_col string);
create table b (key int, some_col string);

set hive.optimize.ppd=true;
explain select a.* from a join b on(a.key=b.key) where a.some_col='blah';

set hive.optimize.ppd=false;
explain select a.* from a join b on(a.key=b.key) where a.some_col='blah';

There is a difference in the explains, the first one has a Filter
operator on some_col, quite high in the tree. So I guess here it is
working, although I still see another Filter operator in reduce deeper
down in both, I'm not sure if that is correct or not, but I believe
that it should be only executed once. I put the results at pastebin so
you can see yourself: http://pastebin.com/gquMksqE and
http://pastebin.com/0FPx7KKG.

Jan