Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Skew Join Optimization in hive


Copy link to this message
-
Re: Skew Join Optimization in hive
Have you tried splitting the query into 2 or 3 steps and/or enabling map
jons (SET hive.auto.convert.join = true;) if some of the tables are
smallish?
On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have a query which joins 12 different tables (most of them left outer
> joins) and the query takes almost 3 hours. 90% of the time is taken by a
> single reducer. One reducer is getting bulk of the data to process.
>
> How can I get around this and have fair distribution of data across all
> reducers? I tried to enable the skewjoin optimization but getting below NPE
> after first step of the job is executed.
>
> Any suggestions/ideas will be or great help.
>
> Thanks,
> Shantian
>
> 2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
> 2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
> Ended Job = job_201106071542_0010
> java.lang.NullPointerException
>     at
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
>     at
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> FAILED: Execution Error, return code -101 from
> org.apache.hadoop.hive.ql.exec.ConditionalTask
> hive>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB