Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Skew join failure


+
David Morel 2012-11-30, 10:10
+
Mark Grover 2012-11-30, 15:46
+
David Morel 2012-12-03, 20:25
Copy link to this message
-
Re: Skew join failure
Hey David,
Sure thing. Play around with that property's value, see if that makes any
difference.

Also, if you could search to see if a file with a name like *hive_skew_join_
**bigkeys* exists on HDFS. Perhaps, it's looking at a different path. If
so, we can figure out how to fix that.

Mark

On Mon, Dec 3, 2012 at 12:25 PM, David Morel <[EMAIL PROTECTED]>wrote:

> On 30 Nov 2012, at 16:46, Mark Grover wrote:
>
>  Hi David, It seems like Hive is unable to find the skewed keys on
>> HDFS. Did you set *hive.skewjoin.key property? If so, to what value?*
>>
>
> Hey Mark,
>
> thanks for answering!
>
> I didn't set it to anything, but left it at its default value (100,000
> IIRC). I should probably have set it to a much lower value (I guess?)
> but I fail to understand why not meeting the threshold would break the
> whole thing. I guess I have too inspect the logs more closely? Do you
> have real-life examples of skewjoin params settings? the docs are really
> scarce about it...
>
> thanks!
>
> David
>
>
>> Mark
>>
>> On Fri, Nov 30, 2012 at 2:10 AM, David Morel
>> <[EMAIL PROTECTED]>**wrote:
>>
>>  Hi,
>>>
>>> I am trying to solve the "last reducer hangs because of GC because of
>>> truckloads of data" issue that I have on some queries, by using SET
>>> hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
>>> encounter an error of the form: ... 2012-11-30 10:42:39,181 Stage-10
>>> map = 100%, reduce = 100%, Cumulative CPU 406984.1 sec MapReduce
>>> Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds 100
>>> msec Ended Job = job_201211281801_0463 java.io.FileNotFoundException:
>>> File hdfs://nameservice1/tmp/hive-**** dmorel/hive_2012-11-30_09-23-***
>>> *00_375_8178040921995939301/-
>>> ** mr-10014/hive_skew_join_****bigkeys_0 does not exist. at
>>> org.apache.hadoop.hdfs.****DistributedFileSystem.****listStatus(**
>>> DistributedFileSystem.java:****365) at
>>> org.apache.hadoop.hive.ql.****plan.****ConditionalResolverSkewJoin.
>>> **getTasks(****ConditionalResolverSkewJoin.****java:96) at
>>> org.apache.hadoop.hive.ql.****exec.ConditionalTask.execute(****
>>> ConditionalTask.java:81) at
>>> org.apache.hadoop.hive.ql.****exec.Task.executeTask(Task.** java:133)
>>> at org.apache.hadoop.hive.ql.****exec.TaskRunner.runSequential(****
>>> TaskRunner.java:57) at
>>> org.apache.hadoop.hive.ql.****Driver.launchTask(Driver.java:**** 1332)
>>> at
>>> org.apache.hadoop.hive.ql.****Driver.execute(Driver.java:****1123) at
>>> org.apache.hadoop.hive.ql.****Driver.run(Driver.java:931) ...
>>>
>>>
>>> Googling didn't give me any indication on how to debug/solve this, so
>>> I'd be glad if I could get any indication where to start looking.
>>>
>>> I'm using CMF4.0 currently, so Hive 0.8.1.
>>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB