Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 16313: PIG-3604 Implement replicated join in Tez


Copy link to this message
-
Re: Review Request 16313: PIG-3604 Implement replicated join in Tez
Cheolsoo Park 2013-12-18, 03:04

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16313/
-----------------------------------------------------------

(Updated Dec. 18, 2013, 3:04 a.m.)
Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Changes
-------

Incorporated Rohini's comments-
* Adds an e2e test case for outer replicated join.
* Adds a unit test case for 3-way replicated join.
* Adds a unit test case for replicated join in reducer.
* Cleans up POShuffleTezLoad code to make use of inputKeys. Now POShuffleTezLoad#attachInputs() looks up LogicalInputs by inputKey and only attaches applicable ones to itself. For example, it attaches ShuffledMergedInputs but ignores ShuffledUnorderedKVInputs. This is needed because it is possible for both broadcast and scatter/gather edges to be attached to the same vertex. In that case, we should only attach applicable inputs to different operators in the vertex.
* Includes the fix for PIG-3624 (establishing the order of joined columns).

ant test-tez passes.
e2e test passes.
Bugs: PIG-3604
    https://issues.apache.org/jira/browse/PIG-3604
Repository: pig-git
Description
-------

Implemented replicated join in Tez as follows:
- POFRJoinTez extends POFRJoin. The difference between two is that replication hash table is constructed out of broadcasting edges in Tez instead of files on distributed cache in MR.
- TezCompiler adds a vertex per replicated table and connect it to POFRJoin vertex via broadcasting edge.

Note that in POLocalRerrangeTez, I package tuples in the same way for broadcast and scatter/gather edges, so I removed outputType (DataMovementType).
Diffs (updated)
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java d7c54d8
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java e900751
  src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java e69de29
  src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java cda5d89
  src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java d76cfc5
  src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e6f9be5
  src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 7a1736a
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 2584501
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 96ccdde
  test/e2e/pig/tests/tez.conf b280698
  test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld e69de29
  test/org/apache/pig/test/data/GoldenFiles/TEZC11.gld e69de29
  test/org/apache/pig/tez/TestTezCompiler.java 79dc94e

Diff: https://reviews.apache.org/r/16313/diff/
Testing
-------

Added a unit test case to TestTezCompiler.
Added a e2e test case to Join.

ant test-tez passes.
e2e test passes.
Thanks,

Cheolsoo Park