Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> MapSide join in Hive


Copy link to this message
-
MapSide join in Hive
Hi all,

 I am joining 2 datasets, one is around 1.5TB in size and the other is
around 350MB in size.

I wanted to do a Map Side join using "id" as the join column between the
two tables. I read about the Mapside join in Hive.

http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins. Are there some
technical specs on Mapside join on a wiki/jira?

Here are some questions:

1)       Do the tables need to be sorted on "id"?

2)       Is there a restriction on the smaller table size?

Are there other join optimizations that Hive provides which I can apply
here?

 

Viraj

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB