Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> different outer join plan between hive 0.9 and hive 0.10


Copy link to this message
-
different outer join plan between hive 0.9 and hive 0.10
Recently we tried to upgrade our hive from 0.9 to 0.10, but found some of
our hive queries almost 7 times slow.  One of such query consists multiple
table outer join on the same key. By looking into the query,  we found the
query plans generate by hive 0.9 and hive 0.10 are different.  Here is the
example:

testcase:

use default;
create table test_join (
  `key` string,
  `value` string
);

explain select
    sum(a.value) val
from default.test_join a
left outer join default.test_join b on a.key = b.key
left outer join default.test_join c on a.key = c.key
left outer join default.test_join d on a.key = d.key
left outer join default.test_join e on a.key = e.key
left outer join default.test_join f on a.key = f.key
left outer join default.test_join g on a.key = g.key
the explain of hive 0.9:

 STAGE DEPENDENCIES:

    Stage-1 is a root stage

    Stage-2 depends on stages: Stage-1

    Stage-0 is a root stage

...

Reduce Operator Tree:

          Join Operator

            condition map:

                 Left Outer Join0 to 1

                 Left Outer Join0 to 2

                 Left Outer Join0 to 3

                 Left Outer Join0 to 4

                 Left Outer Join0 to 5

                 Left Outer Join0 to 6

            condition expressions:

              0 {VALUE._col1}

              1

              2

              3

              4

              5

              6

......
while the explain of hive 0.10:

  STAGE DEPENDENCIES:

    Stage-6 is a root stage

    Stage-1 depends on stages: Stage-6

    Stage-2 depends on stages: Stage-1

    Stage-0 is a root stage

...

     Reduce Operator Tree:

          Join Operator

            condition map:

                 Left Outer Join0 to 1

                 Left Outer Join0 to 2

            condition expressions:

              0 {VALUE._col0} {VALUE._col1}

              1

              2

...

     Reduce Operator Tree:

          Join Operator

            condition map:

                 Left Outer Join0 to 1

                 Left Outer Join0 to 2

                 Left Outer Join0 to 3

                 Left Outer Join0 to 4

            condition expressions:

              0 {VALUE._col9}

              1

              2

              3

              4

....
It seems like hive 0.9 use only one stage/job to process all outer joins
but hive 0.10 split them into two stage.  When running such kind of query
on hive0.10 in production,  in the second stage of outer join process, some
reducer stucks.

I can't find any param to change the query plain ,  can anyone give me some
hint?

Thanks!
+
Navis류승우 2013-03-28, 05:47
+
wzc 2013-03-28, 15:21
+
Navis류승우 2013-03-29, 01:34
+
wzc1989 2013-05-13, 16:11
+
Navis류승우 2013-05-14, 04:26
+
wzc1989 2013-05-14, 05:06
+
Navis류승우 2013-05-14, 06:17
+
wzc1989 2013-07-01, 07:15
+
Navis류승우 2013-07-02, 00:24
+
wzc1989 2013-07-03, 15:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB