Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 16313: PIG-3604 Implement replicated join in Tez


Copy link to this message
-
Re: Review Request 16313: PIG-3604 Implement replicated join in Tez
Cheolsoo Park 2013-12-18, 03:17


> On Dec. 17, 2013, 3:52 p.m., Rohini Palaniswamy wrote:
> > test/org/apache/pig/tez/TestTezCompiler.java, line 216
> > <https://reviews.apache.org/r/16313/diff/1/?file=398711#file398711line216>
> >
> >     Can we add cases for
> >      - three or four way join?
> >      - replicated table is part of a reduce output instead of being loaded directly. This is to handle the case where you don't create a separate vertex to broadcast, but broadcast from a existing vertex (POLocalRearrange) just changing the edge type to broadcast. Don't think the TezCompiler handles this now.

I has just realized that I misunderstood the 2nd point. In my new patch, I handles the case where *fragmented* table is a predecessor's output and replicated join happens in reducer. I don't handle the case where *replicated* table is a predecessor's output yet. Can I handle it in a separate jira?
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16313/#review30533
-----------------------------------------------------------
On Dec. 18, 2013, 3:04 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16313/
> -----------------------------------------------------------
>
> (Updated Dec. 18, 2013, 3:04 a.m.)
>
>
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3604
>     https://issues.apache.org/jira/browse/PIG-3604
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implemented replicated join in Tez as follows:
> - POFRJoinTez extends POFRJoin. The difference between two is that replication hash table is constructed out of broadcasting edges in Tez instead of files on distributed cache in MR.
> - TezCompiler adds a vertex per replicated table and connect it to POFRJoin vertex via broadcasting edge.
>
> Note that in POLocalRerrangeTez, I package tuples in the same way for broadcast and scatter/gather edges, so I removed outputType (DataMovementType).
>
>
> Diffs
> -----
>
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java d7c54d8
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java e900751
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java e69de29
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java cda5d89
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java d76cfc5
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e6f9be5
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 7a1736a
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 2584501
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 96ccdde
>   test/e2e/pig/tests/tez.conf b280698
>   test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld e69de29
>   test/org/apache/pig/test/data/GoldenFiles/TEZC11.gld e69de29
>   test/org/apache/pig/tez/TestTezCompiler.java 79dc94e
>
> Diff: https://reviews.apache.org/r/16313/diff/
>
>
> Testing
> -------
>
> Added a unit test case to TestTezCompiler.
> Added a e2e test case to Join.
>
> ant test-tez passes.
> e2e test passes.
>
>
> Thanks,
>
> Cheolsoo Park
>
>