Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> FW: Big table join optimization


+
Guy Doulberg 2014-01-31, 00:07
Copy link to this message
-
Re: FW: Big table join optimization
lets see the minimal query that shows your problem with some comments about
cardinality of the tables in the join. maybe there could be a crude
workaround using a temp table or some such device if nothing jumps out at
us.
On Thu, Jan 30, 2014 at 4:07 PM, Guy Doulberg <[EMAIL PROTECTED]>wrote:

>
>   hi guys
>
> I am trying to optimize a hive join query, I have a join of two big
> tables. The join between them is taking too long, no matter how many
> reducers I set, there are always two reducers struggling to finish  in the
> end of  the job
> The job not always ends, sometime it fails with memory problems
>
> In the fast completed reducers I can see:
> 7688459 rows: used memory = 991337736
>
> In the long running reducers:
>
> 43363436 rows: used memory = 1142368456
>
>
> At first I thought  am dealing with  skew key, but I set the
> hive.optimize.skewjoin to true, and  it didn't change a thing, I played
> with  hive.skewjoin.key also didn't change a thing
>
> Any other ideas I can try?
>
> I am using hive 0.10 of CDH4.2.1
>
> the source tables are using customized   serdes
>
>
> Thanks
> Guy Doulberg
> Team leader @ Perion
>