Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> map side join with group by


Copy link to this message
-
map side join with group by
I have a silly question on how Hive interpretes a simple query with both
map side join and group by.

Below query will translate into two jobs, with the 1st one as a map only
job doing the join and storing the output in a intermediary location, and
the 2nd one as a map-reduce job taking the output of the 1st job as input
and doing the group by.

SELECT
/*+ MAPJOIN(d) */
table.a, sum(table2.b)
from table
LEFT OUTER JOIN table2
ON table.id = table2.id
where hour = '2012-12-11 11'
group by table.a

Why can't this be done within a single map reduce job? As what I can see
from the query plan is that all 2nd job mapper do is taking the 1st job's
mapper output.

--
Chen Song
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB