Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> MapSide join in Hive


Copy link to this message
-
RE: MapSide join in Hive
Hi Amr,

 Thanks for your help. Let me try the STREAMTABLE option, if one of the
datasets exceeds 1GB.

Vira

 

________________________________

From: Amr Awadallah [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 26, 2010 12:58 AM
To: [EMAIL PROTECTED]
Subject: Re: MapSide join in Hive

 

Viraj,

1. No
2. Yes, smaller table needs to fit in jvm memory (typically more than
1GB for small table is too large).

See slide 7 and after in this preso for different join strategies that
can help in case the tables are bucketed and sorted.

http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team

There is also the /*+STREAMTABLE(tablealias)*/ hint, which you should
use for very large tables (or make sure it is the rightmost table in the
join clause).

-- amr

On 6/24/2010 10:43 AM, Viraj Bhat wrote:

Hi all,

 I am joining 2 datasets, one is around 1.5TB in size and the other is
around 350MB in size.

I wanted to do a Map Side join using "id" as the join column between the
two tables. I read about the Mapside join in Hive.

http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins. Are there some
technical specs on Mapside join on a wiki/jira?

Here are some questions:

Do the tables need to be sorted on "id"?

Is there a restriction on the smaller table size?

Are there other join optimizations that Hive provides which I can apply
here?

 

Viraj

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB