Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> hive.optimize.skewjoin problem


Copy link to this message
-
Re: hive.optimize.skewjoin problem
Looks like there is already a JIRA for this:
https://issues.apache.org/jira/browse/HIVE-4693. It repros in Hive 0.11 too.
On Thu, Aug 1, 2013 at 3:30 AM, Chandraprakash Bhagtani <
[EMAIL PROTECTED]> wrote:

> I got some clue on this.. Actually I was running a patched hive, so it was
> eating up the exception. When i reverted the patch, i see the following
> exception in either case (query1 and query2)
>
> ava.io.FileNotFoundException: File
> hdfs://mycluster/tmp/hive-training/hive_2013-08-01_03-24-07_554_3719658871426124253/-mr-10002/hive_skew_join_bigkeys_0
> does not exist.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:410)
>  at
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:102)
> at
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
>  at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
>
> So it seems that hive_skew_join_bigkeys_0 file is not being created by
> previous stage. I couldn't locate the source where this file is being
> created. With query2 even after printing exception it is generating the
> result. BTW i am running hive 0.10  (cdh4.3)
>
> Any clue?
>
>
>
> On Thu, Aug 1, 2013 at 3:11 PM, Chandraprakash Bhagtani <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I was facing a weird issue with hive today. I ran the following 2 queries
>>
>> query1:   select co.city from company1 co inner join customer1 cu on
>> (co.city=cu.city);
>>
>> query2:  select distinct co.city from company1 co inner join customer1 cu
>> on (co.city=cu.city);
>>
>>
>> the difference in both these queries is distinct keyword. The first query
>> is printing the result, but the second query was not printing any result
>> without showing any error.
>>
>> when a disabled skewjoin optimization by setting
>> "hive.optimize.skewjoin=false", query2 started printing the results too.
>>
>> Can anyone explain me what is the issue with skewjoin here?
>>
>> --
>> Thanks & Regards,
>> Chandra Prakash Bhagtani
>>
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani
>