Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> single MR stage for join and group by


Copy link to this message
-
Re: single MR stage for join and group by
and what version of hive are you running your test on?  i do believe - not
certain - that hive 0.11 includes the optimization you seek.
On Thu, Aug 1, 2013 at 10:19 AM, Chen Song <[EMAIL PROTECTED]> wrote:

> Suppose we have 2 simple tables
>
> A
> id int
> value string
>
> B
> id
>
> When hive translates the following query
>
> select max(A.value), A.id from A join B on A.id = A.id group by A.id;
>
> It launches 2 stages, one for the join and one for the group by.
>
> My understanding is that if the join key set is a sub set of the group by
> key set, it can be achieved in the same map reduce job. If that is correct
> in theory, could it be a feature in hive?
>
> Chen
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB