Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request 16728: Implement non-staged MapJoin


Copy link to this message
-
Re: Review Request 16728: Implement non-staged MapJoin
Vikram Dixit Kumaraswamy 2014-01-27, 21:52

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16728/#review32885
-----------------------------------------------------------

itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
<https://reviews.apache.org/r/16728/#comment61876>

    This would break tez tests.

itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
<https://reviews.apache.org/r/16728/#comment61875>

    This would eliminate tez unit tests. Was this intentional?

ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
<https://reviews.apache.org/r/16728/#comment61880>

    Could you raise a jira for this.

ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
<https://reviews.apache.org/r/16728/#comment61882>

    can it be only these 2 operators? Maybe common join operator can be used?
- Vikram Dixit Kumaraswamy
On Jan. 20, 2014, 5 a.m., Navis Ryu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16728/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2014, 5 a.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-6144
>     https://issues.apache.org/jira/browse/HIVE-6144
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example.
>
> {noformat}
> select a.* from src a join src b on a.key=b.key;
> {noformat}
>
> makes plan like this.
> {noformat}
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         a
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         a
>           TableScan
>             alias: a
>             HashTable Sink Operator
>               condition expressions:
>                 0 {key} {value}
>                 1
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               Position of Big Table: 1
>
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                 File Output Operator
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
> {noformat}
>
> table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below.
> {noformat}
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                   File Output Operator
>       Local Work:
>         Map Reduce Local Work
>           Alias -> Map Local Tables:
>             a
>               Fetch Operator
>                 limit: -1
>           Alias -> Map Local Operator Tree: