Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> map side join with group by


Copy link to this message
-
map side join with group by
I have a silly question on how Hive interpretes a simple query with both
map side join and group by.

Below query will translate into two jobs, with the 1st one as a map only
job doing the join and storing the output in a intermediary location, and
the 2nd one as a map-reduce job taking the output of the 1st job as input
and doing the group by.

SELECT
/*+ MAPJOIN(d) */
table.a, sum(table2.b)
from table
LEFT OUTER JOIN table2
ON table.id = table2.id
where hour = '2012-12-11 11'
group by table.a

Why can't this be done within a single map reduce job? As what I can see
from the query plan is that all 2nd job mapper do is taking the 1st job's
mapper output.

--
Chen Song
+
Mark Grover 2012-12-13, 01:41
+
Nitin Pawar 2012-12-13, 05:30
+
Chen Song 2012-12-13, 14:56
+
Nitin Pawar 2012-12-13, 16:04
+
Chen Song 2012-12-13, 18:24
+
Nitin Pawar 2012-12-13, 18:42
+
Chen Song 2012-12-13, 19:12
+
Nitin Pawar 2012-12-13, 19:30
+
Chen Song 2012-12-13, 19:50