Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> different outer join plan between hive 0.9 and hive 0.10


Copy link to this message
-
Re: different outer join plan between hive 0.9 and hive 0.10
It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).

Thanks for reporting it.

2013/3/24 wzc <[EMAIL PROTECTED]>:
> Recently we tried to upgrade our hive from 0.9 to 0.10, but found some of
> our hive queries almost 7 times slow.  One of such query consists multiple
> table outer join on the same key. By looking into the query,  we found the
> query plans generate by hive 0.9 and hive 0.10 are different.  Here is the
> example:
>
> testcase:
>
> use default;
> create table test_join (
>   `key` string,
>   `value` string
> );
>
> explain select
>     sum(a.value) val
> from default.test_join a
> left outer join default.test_join b on a.key = b.key
> left outer join default.test_join c on a.key = c.key
> left outer join default.test_join d on a.key = d.key
> left outer join default.test_join e on a.key = e.key
> left outer join default.test_join f on a.key = f.key
> left outer join default.test_join g on a.key = g.key
>
>
> the explain of hive 0.9:
>
>  STAGE DEPENDENCIES:
>
>     Stage-1 is a root stage
>
>     Stage-2 depends on stages: Stage-1
>
>     Stage-0 is a root stage
>
> ...
>
> Reduce Operator Tree:
>
>           Join Operator
>
>             condition map:
>
>                  Left Outer Join0 to 1
>
>                  Left Outer Join0 to 2
>
>                  Left Outer Join0 to 3
>
>                  Left Outer Join0 to 4
>
>                  Left Outer Join0 to 5
>
>                  Left Outer Join0 to 6
>
>             condition expressions:
>
>               0 {VALUE._col1}
>
>               1
>
>               2
>
>               3
>
>               4
>
>               5
>
>               6
>
> ......
>
>
> while the explain of hive 0.10:
>
>   STAGE DEPENDENCIES:
>
>     Stage-6 is a root stage
>
>     Stage-1 depends on stages: Stage-6
>
>     Stage-2 depends on stages: Stage-1
>
>     Stage-0 is a root stage
>
> ...
>
>      Reduce Operator Tree:
>
>           Join Operator
>
>             condition map:
>
>                  Left Outer Join0 to 1
>
>                  Left Outer Join0 to 2
>
>             condition expressions:
>
>               0 {VALUE._col0} {VALUE._col1}
>
>               1
>
>               2
>
> ...
>
>      Reduce Operator Tree:
>
>           Join Operator
>
>             condition map:
>
>                  Left Outer Join0 to 1
>
>                  Left Outer Join0 to 2
>
>                  Left Outer Join0 to 3
>
>                  Left Outer Join0 to 4
>
>             condition expressions:
>
>               0 {VALUE._col9}
>
>               1
>
>               2
>
>               3
>
>               4
>
> ....
>
>
> It seems like hive 0.9 use only one stage/job to process all outer joins but
> hive 0.10 split them into two stage.  When running such kind of query on
> hive0.10 in production,  in the second stage of outer join process, some
> reducer stucks.
>
> I can't find any param to change the query plain ,  can anyone give me some
> hint?
>
> Thanks!
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB