David Morel 2012-11-30, 10:10
Mark Grover 2012-11-30, 15:46
David Morel 2012-12-03, 20:25
-Re: Skew join failure
Mark Grover 2012-12-04, 06:01
Sure thing. Play around with that property's value, see if that makes any
Also, if you could search to see if a file with a name like *hive_skew_join_
**bigkeys* exists on HDFS. Perhaps, it's looking at a different path. If
so, we can figure out how to fix that.
On Mon, Dec 3, 2012 at 12:25 PM, David Morel <[EMAIL PROTECTED]>wrote:
> On 30 Nov 2012, at 16:46, Mark Grover wrote:
> Hi David, It seems like Hive is unable to find the skewed keys on
>> HDFS. Did you set *hive.skewjoin.key property? If so, to what value?*
> Hey Mark,
> thanks for answering!
> I didn't set it to anything, but left it at its default value (100,000
> IIRC). I should probably have set it to a much lower value (I guess?)
> but I fail to understand why not meeting the threshold would break the
> whole thing. I guess I have too inspect the logs more closely? Do you
> have real-life examples of skewjoin params settings? the docs are really
> scarce about it...
>> On Fri, Nov 30, 2012 at 2:10 AM, David Morel
>> <[EMAIL PROTECTED]>**wrote:
>>> I am trying to solve the "last reducer hangs because of GC because of
>>> truckloads of data" issue that I have on some queries, by using SET
>>> hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
>>> encounter an error of the form: ... 2012-11-30 10:42:39,181 Stage-10
>>> map = 100%, reduce = 100%, Cumulative CPU 406984.1 sec MapReduce
>>> Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds 100
>>> msec Ended Job = job_201211281801_0463 java.io.FileNotFoundException:
>>> File hdfs://nameservice1/tmp/hive-**** dmorel/hive_2012-11-30_09-23-***
>>> ** mr-10014/hive_skew_join_****bigkeys_0 does not exist. at
>>> DistributedFileSystem.java:****365) at
>>> **getTasks(****ConditionalResolverSkewJoin.****java:96) at
>>> ConditionalTask.java:81) at
>>> org.apache.hadoop.hive.ql.****exec.Task.executeTask(Task.** java:133)
>>> at org.apache.hadoop.hive.ql.****exec.TaskRunner.runSequential(****
>>> TaskRunner.java:57) at
>>> org.apache.hadoop.hive.ql.****Driver.launchTask(Driver.java:**** 1332)
>>> org.apache.hadoop.hive.ql.****Driver.execute(Driver.java:****1123) at
>>> org.apache.hadoop.hive.ql.****Driver.run(Driver.java:931) ...
>>> Googling didn't give me any indication on how to debug/solve this, so
>>> I'd be glad if I could get any indication where to start looking.
>>> I'm using CMF4.0 currently, so Hive 0.8.1.