Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> hive.optimize.skewjoin problem


Copy link to this message
-
Re: hive.optimize.skewjoin problem
Looks like there is already a JIRA for this:
https://issues.apache.org/jira/browse/HIVE-4693. It repros in Hive 0.11 too.
On Thu, Aug 1, 2013 at 3:30 AM, Chandraprakash Bhagtani <
[EMAIL PROTECTED]> wrote:

> I got some clue on this.. Actually I was running a patched hive, so it was
> eating up the exception. When i reverted the patch, i see the following
> exception in either case (query1 and query2)
>
> ava.io.FileNotFoundException: File
> hdfs://mycluster/tmp/hive-training/hive_2013-08-01_03-24-07_554_3719658871426124253/-mr-10002/hive_skew_join_bigkeys_0
> does not exist.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:410)
>  at
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:102)
> at
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
>  at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
>
> So it seems that hive_skew_join_bigkeys_0 file is not being created by
> previous stage. I couldn't locate the source where this file is being
> created. With query2 even after printing exception it is generating the
> result. BTW i am running hive 0.10  (cdh4.3)
>
> Any clue?
>
>
>
> On Thu, Aug 1, 2013 at 3:11 PM, Chandraprakash Bhagtani <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I was facing a weird issue with hive today. I ran the following 2 queries
>>
>> query1:   select co.city from company1 co inner join customer1 cu on
>> (co.city=cu.city);
>>
>> query2:  select distinct co.city from company1 co inner join customer1 cu
>> on (co.city=cu.city);
>>
>>
>> the difference in both these queries is distinct keyword. The first query
>> is printing the result, but the second query was not printing any result
>> without showing any error.
>>
>> when a disabled skewjoin optimization by setting
>> "hive.optimize.skewjoin=false", query2 started printing the results too.
>>
>> Can anyone explain me what is the issue with skewjoin here?
>>
>> --
>> Thanks & Regards,
>> Chandra Prakash Bhagtani
>>
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB