Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> FW: Big table join optimization


Copy link to this message
-
Re: FW: Big table join optimization
lets see the minimal query that shows your problem with some comments about
cardinality of the tables in the join. maybe there could be a crude
workaround using a temp table or some such device if nothing jumps out at
us.
On Thu, Jan 30, 2014 at 4:07 PM, Guy Doulberg <[EMAIL PROTECTED]>wrote:

>
>   hi guys
>
> I am trying to optimize a hive join query, I have a join of two big
> tables. The join between them is taking too long, no matter how many
> reducers I set, there are always two reducers struggling to finish  in the
> end of  the job
> The job not always ends, sometime it fails with memory problems
>
> In the fast completed reducers I can see:
> 7688459 rows: used memory = 991337736
>
> In the long running reducers:
>
> 43363436 rows: used memory = 1142368456
>
>
> At first I thought  am dealing with  skew key, but I set the
> hive.optimize.skewjoin to true, and  it didn't change a thing, I played
> with  hive.skewjoin.key also didn't change a thing
>
> Any other ideas I can try?
>
> I am using hive 0.10 of CDH4.2.1
>
> the source tables are using customized   serdes
>
>
> Thanks
> Guy Doulberg
> Team leader @ Perion
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB