Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Skew join failure


Copy link to this message
-
Skew join failure
David Morel 2012-11-30, 10:10
Hi,

I am trying to solve the "last reducer hangs because of GC because of
truckloads of data" issue that I have on some queries, by using SET
hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
encounter an error of the form:
...
2012-11-30 10:42:39,181 Stage-10 map = 100%,  reduce = 100%, Cumulative
CPU 406984.1 sec
MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds
100 msec
Ended Job = job_201211281801_0463
java.io.FileNotFoundException: File
hdfs://nameservice1/tmp/hive-dmorel/hive_2012-11-30_09-23-00_375_8178040921995939301/-mr-10014/hive_skew_join_bigkeys_0
does not exist.
         at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365)
         at
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
         at
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
         at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
         at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
         at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
...

Googling didn't give me any indication on how to debug/solve this, so
I'd be glad if I could get any indication where to start looking.

I'm using CMF4.0 currently, so Hive 0.8.1.

Thanks a lot!

David Morel