Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Skew Join Optimization in hive


Copy link to this message
-
Re: Skew Join Optimization in hive
Igor Tatarinov 2011-06-07, 19:58
Have you tried splitting the query into 2 or 3 steps and/or enabling map
jons (SET hive.auto.convert.join = true;) if some of the tables are
smallish?
On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have a query which joins 12 different tables (most of them left outer
> joins) and the query takes almost 3 hours. 90% of the time is taken by a
> single reducer. One reducer is getting bulk of the data to process.
>
> How can I get around this and have fair distribution of data across all
> reducers? I tried to enable the skewjoin optimization but getting below NPE
> after first step of the job is executed.
>
> Any suggestions/ideas will be or great help.
>
> Thanks,
> Shantian
>
> 2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
> 2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
> Ended Job = job_201106071542_0010
> java.lang.NullPointerException
>     at
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
>     at
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> FAILED: Execution Error, return code -101 from
> org.apache.hadoop.hive.ql.exec.ConditionalTask
> hive>
>
>