Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> single MR stage for join and group by

Copy link to this message
single MR stage for join and group by
Suppose we have 2 simple tables

id int
value string


When hive translates the following query

select max(A.value), A.id from A join B on A.id = A.id group by A.id;

It launches 2 stages, one for the join and one for the group by.

My understanding is that if the join key set is a sub set of the group by
key set, it can be achieved in the same map reduce job. If that is correct
in theory, could it be a feature in hive?

Stephen Sprague 2013-08-02, 00:32
Yin Huai 2013-08-02, 04:14
Chen Song 2013-08-02, 17:32