Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> single MR stage for join and group by


Copy link to this message
-
single MR stage for join and group by
Suppose we have 2 simple tables

A
id int
value string

B
id

When hive translates the following query

select max(A.value), A.id from A join B on A.id = A.id group by A.id;

It launches 2 stages, one for the join and one for the group by.

My understanding is that if the join key set is a sub set of the group by
key set, it can be achieved in the same map reduce job. If that is correct
in theory, could it be a feature in hive?

Chen
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB