Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> FW: Big table join optimization

Copy link to this message
Re: FW: Big table join optimization
lets see the minimal query that shows your problem with some comments about
cardinality of the tables in the join. maybe there could be a crude
workaround using a temp table or some such device if nothing jumps out at
On Thu, Jan 30, 2014 at 4:07 PM, Guy Doulberg <[EMAIL PROTECTED]>wrote:

>   hi guys
> I am trying to optimize a hive join query, I have a join of two big
> tables. The join between them is taking too long, no matter how many
> reducers I set, there are always two reducers struggling to finish  in the
> end of  the job
> The job not always ends, sometime it fails with memory problems
> In the fast completed reducers I can see:
> 7688459 rows: used memory = 991337736
> In the long running reducers:
> 43363436 rows: used memory = 1142368456
> At first I thought  am dealing with  skew key, but I set the
> hive.optimize.skewjoin to true, and  it didn't change a thing, I played
> with  hive.skewjoin.key also didn't change a thing
> Any other ideas I can try?
> I am using hive 0.10 of CDH4.2.1
> the source tables are using customized   serdes
> Thanks
> Guy Doulberg
> Team leader @ Perion