Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Skew join failure


Copy link to this message
-
Re: Skew join failure
David Morel 2012-12-03, 20:25
On 30 Nov 2012, at 16:46, Mark Grover wrote:

> Hi David, It seems like Hive is unable to find the skewed keys on
> HDFS. Did you set *hive.skewjoin.key property? If so, to what value?*

Hey Mark,

thanks for answering!

I didn't set it to anything, but left it at its default value (100,000
IIRC). I should probably have set it to a much lower value (I guess?)
but I fail to understand why not meeting the threshold would break the
whole thing. I guess I have too inspect the logs more closely? Do you
have real-life examples of skewjoin params settings? the docs are really
scarce about it...

thanks!

David

>
> Mark
>
> On Fri, Nov 30, 2012 at 2:10 AM, David Morel
> <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> I am trying to solve the "last reducer hangs because of GC because of
>> truckloads of data" issue that I have on some queries, by using SET
>> hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
>> encounter an error of the form: ... 2012-11-30 10:42:39,181 Stage-10
>> map = 100%, reduce = 100%, Cumulative CPU 406984.1 sec MapReduce
>> Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds 100
>> msec Ended Job = job_201211281801_0463 java.io.FileNotFoundException:
>> File hdfs://nameservice1/tmp/hive-**
>> dmorel/hive_2012-11-30_09-23-**00_375_8178040921995939301/-
>> ** mr-10014/hive_skew_join_**bigkeys_0 does not exist. at
>> org.apache.hadoop.hdfs.**DistributedFileSystem.**listStatus(**
>> DistributedFileSystem.java:**365) at
>> org.apache.hadoop.hive.ql.**plan.**ConditionalResolverSkewJoin.
>> **getTasks(**ConditionalResolverSkewJoin.**java:96) at
>> org.apache.hadoop.hive.ql.**exec.ConditionalTask.execute(**
>> ConditionalTask.java:81) at
>> org.apache.hadoop.hive.ql.**exec.Task.executeTask(Task.** java:133)
>> at org.apache.hadoop.hive.ql.**exec.TaskRunner.runSequential(**
>> TaskRunner.java:57) at
>> org.apache.hadoop.hive.ql.**Driver.launchTask(Driver.java:** 1332) at
>> org.apache.hadoop.hive.ql.**Driver.execute(Driver.java:**1123) at
>> org.apache.hadoop.hive.ql.**Driver.run(Driver.java:931) ...
>>
>> Googling didn't give me any indication on how to debug/solve this, so
>> I'd be glad if I could get any indication where to start looking.
>>
>> I'm using CMF4.0 currently, so Hive 0.8.1.