Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - map side join with group by


Copy link to this message
-
map side join with group by
Chen Song 2012-12-12, 23:32
I have a silly question on how Hive interpretes a simple query with both
map side join and group by.

Below query will translate into two jobs, with the 1st one as a map only
job doing the join and storing the output in a intermediary location, and
the 2nd one as a map-reduce job taking the output of the 1st job as input
and doing the group by.

SELECT
/*+ MAPJOIN(d) */
table.a, sum(table2.b)
from table
LEFT OUTER JOIN table2
ON table.id = table2.id
where hour = '2012-12-11 11'
group by table.a

Why can't this be done within a single map reduce job? As what I can see
from the query plan is that all 2nd job mapper do is taking the 1st job's
mapper output.

--
Chen Song