|
|
+
Viraj Bhat 2010-06-24, 17:43
-
Re: MapSide join in HiveAmr Awadallah 2010-06-26, 07:58
Viraj,
1. No 2. Yes, smaller table needs to fit in jvm memory (typically more than 1GB for small table is too large). See slide 7 and after in this preso for different join strategies that can help in case the tables are bucketed and sorted. http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team There is also the /*+STREAMTABLE(tablealias)*/ hint, which you should use for very large tables (or make sure it is the rightmost table in the join clause). -- amr On 6/24/2010 10:43 AM, Viraj Bhat wrote: > > Hi all, > > I am joining 2 datasets, one is around 1.5TB in size and the other is > around 350MB in size. > > I wanted to do a Map Side join using "id" as the join column between > the two tables. I read about the Mapside join in Hive. > > http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins. Are there > some technical specs on Mapside join on a wiki/jira? > > Here are some questions: > > 1) Do the tables need to be sorted on "id"? > > 2) Is there a restriction on the smaller table size? > > Are there other join optimizations that Hive provides which I can > apply here? > > Viraj > +
Viraj Bhat 2010-06-29, 20:42
|