Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Skew join failure


Copy link to this message
-
Skew join failure
Hi,

I am trying to solve the "last reducer hangs because of GC because of
truckloads of data" issue that I have on some queries, by using SET
hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
encounter an error of the form:
...
2012-11-30 10:42:39,181 Stage-10 map = 100%,  reduce = 100%, Cumulative
CPU 406984.1 sec
MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds
100 msec
Ended Job = job_201211281801_0463
java.io.FileNotFoundException: File
hdfs://nameservice1/tmp/hive-dmorel/hive_2012-11-30_09-23-00_375_8178040921995939301/-mr-10014/hive_skew_join_bigkeys_0
does not exist.
         at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365)
         at
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
         at
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
         at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
         at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
         at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
...

Googling didn't give me any indication on how to debug/solve this, so
I'd be glad if I could get any indication where to start looking.

I'm using CMF4.0 currently, so Hive 0.8.1.

Thanks a lot!

David Morel
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB