Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Review Request 14953: Pushdown join conditions


Copy link to this message
-
Re: Review Request 14953: Pushdown join conditions


> On Oct. 30, 2013, 7:43 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/join_cond_pushdown_1.q.out, lines 497-500
> > <https://reviews.apache.org/r/14953/diff/1/?file=371573#file371573line497>
> >
> >     If I am reading this right, this is cross followed by filter. An alternate (efficient) plan would have been to have this filter condition as join condition, since while doing join we will have access to columns from both sides and join will only generate rows which satisfies the condition. Makes sense?
> >     Its ok that this patch is not doing it, but shall we open up a follow-up jira for later to optimize for such cases?

- the lhs of the filter refers to both joining tables.
- filters in the JoinOp, must be on only 1 table; they get applied on the input row; there is no concept of filtering intermediate/output rows in the JoinOp.
- this is similar to the issue with merging join trees; I sent you email about this earlier today.
> On Oct. 30, 2013, 7:43 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/join_cond_pushdown_2.q.out, line 285
> > <https://reviews.apache.org/r/14953/diff/1/?file=371574#file371574line285>
> >
> >     Seems like here mergeJoinTrees didn't generate most efficient Join Trees, as evident from MR jobs. 3 jobs got generated where 2 would have sufficed. First job to do join p1, p2, p3 on name column and than output of previous job joining p4 on partkey.
> >     Seems like for this to happen we need to identify transivity of join conditions across different joins. Hopefully, optiq should be able to help us out. Also, curious  if hive.optimize.correlation can help here. Can you test with that config on. Will be good to create a follow-up jira for this as well if currently there is no way to coerce hive into generating an efficient plan for this

the join conditions are:
p1, p2 on name, partkey
p2,p3 on name
p1, p4 on partkey

So it cannot do p1,p2,p3 in one Job. Right?
- Harish
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14953/#review27806
-----------------------------------------------------------
On Oct. 29, 2013, 9:19 p.m., Harish Butani wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14953/
> -----------------------------------------------------------
>
> (Updated Oct. 29, 2013, 9:19 p.m.)
>
>
> Review request for hive, Ashutosh Chauhan and Vikram Dixit Kumaraswamy.
>
>
> Bugs: hive-5556
>     https://issues.apache.org/jira/browse/hive-5556
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Step 1 to support Alternate Join Syntax: HIVE-5555
>
> This patch also contains fixes to merging of QBJoinTrees
>
>
> Diffs
> -----
>
>   ql/src/java/org/apache/hadoop/hive/ql/parse/QBJoinTree.java 9c8cac1
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cf0c895
>   ql/src/test/org/apache/hadoop/hive/ql/parse/TestQBJoinTreeApplyPredicate.java PRE-CREATION
>   ql/src/test/queries/clientpositive/join_cond_pushdown_1.q PRE-CREATION
>   ql/src/test/queries/clientpositive/join_cond_pushdown_2.q PRE-CREATION
>   ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 865627b
>   ql/src/test/results/clientpositive/join_cond_pushdown_1.q.out PRE-CREATION
>   ql/src/test/results/clientpositive/join_cond_pushdown_2.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/14953/diff/
>
>
> Testing
> -------
>
> ran all join .q files
> added join_cond_pushdown_1.q, join_cond_pushdown_2.q .q tests
> added TestQBJoinTreeApplyPredicate unit test to test pushdown functionality
>
>
> Thanks,
>
> Harish Butani
>
>