Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16313: PIG-3604 Implement replicated join in Tez

Copy link to this message
Review Request 16313: PIG-3604 Implement replicated join in Tez

This is an automatically generated e-mail. To reply, visit:

Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Bugs: PIG-3604
Repository: pig-git

Implemented replicated join in Tez as follows:
- POFRJoinTez extends POFRJoin. The difference between two is that replication hash table is constructed out of broadcasting edges in Tez instead of files on distributed cache in MR.
- TezCompiler adds a vertex per replicated table and connect it to POFRJoin vertex via broadcasting edge.

Note that in POLocalRerrangeTez, I package tuples in the same way for broadcast and scatter/gather edges, so I removed outputType (DataMovementType).

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java d7c54d8
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java e900751
  src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java e69de29
  src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java cda5d89
  src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 7a1736a
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 2584501
  test/e2e/pig/tests/tez.conf b280698
  test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld e69de29
  test/org/apache/pig/tez/TestTezCompiler.java 79dc94e

Diff: https://reviews.apache.org/r/16313/diff/

Added a unit test case to TestTezCompiler.
Added a e2e test case to Join.

ant test-tez passes.
e2e test passes.

Cheolsoo Park