Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Skew Joins borked on Hive11 (Hadoop23)?

Copy link to this message
Skew Joins borked on Hive11 (Hadoop23)?
Hello, all.

Has anyone noticed that skew-joins aren't working on Hive 0.11 / Hadoop 0.23?

I've been running the TPC-h benchmarks against Hive 0.11, and I see that none of the queries run through if hive.optimize.skewjoin is set to true.

I initially ran into problems like the following:

Ended Job = job_1371646843240_1214
java.io.FileNotFoundException: File hdfs://fstaxxx.yyy.yahoo.com/tmp/hive_2013-07-12_03-22-31_737_6843191588894968654/-mr-10004/hive_skew_join_bigkeys_0 does not exist.

Patching Hive 0.11 with HIVE-4646 resolved that problem.

What I see now is that a couple of stages of the query run through successfully, after which I get the following message, and the remaining stages are skipped.

2013-07-12 23:21:02,164 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 15985.47 sec
MapReduce Total cumulative CPU time: 0 days 4 hours 26 minutes 25 seconds 470 msec
Ended Job = job_1371646843240_1295
Stage-10 is filtered out by condition resolver.
MapReduce Jobs Launched:
Job 0: Map: 380  Reduce: 118   Cumulative CPU: 15900.35 sec   HDFS Read: 24574270287 HDFS Write: 4925478398 SUCCESS
Total MapReduce CPU Time Spent: 0 days 4 hours 25 minutes 0 seconds 350 msec
Time taken: 109.411 seconds
FAILED: SemanticException [Error 10001]: Line 10:5 Table not found 'q16_tmp_cached'

In this particular case, the query is q16_parts_supplier_relationship.hive, part of which looks like:

create table q16_tmp_cached as
  p_brand, p_type, p_size, ps_suppkey
  partsupp ps join part p
    p.p_partkey = ps.ps_partkey and p.p_brand <> 'Brand#45'
    and not p.p_type like 'MEDIUM POLISHED%'
  join supplier_tmp_cached s
    ps.ps_suppkey = s.s_suppkey;

If I can isolate the problem to a smaller test-case, I'll raise a JIRA. I was hoping one of you might have seen this already, or might have a better handle of how skew-joins work in Hive 11.

Many thanks,