Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 20320: PIG:3855 Turn on UnionOptimizer by default and add new e2e tests for union


Copy link to this message
-
Re: Review Request 20320: PIG:3855 Turn on UnionOptimizer by default and add new e2e tests for union
Rohini Palaniswamy 2014-04-18, 19:59

This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20320/

(Updated April 18, 2014, 7:59 p.m.)
Review request for pig, Cheolsoo Park and Daniel Dai.
Changes

Changed the name of the input class to SortedGroupedMergedInput instead of CompositeShuffledMergedInput based on review comments in TEZ-1003.
Bugs: PIG-3855
    https://issues.apache.org/jira/browse/PIG-3855
Repository: pig
Description

Changes done:
Created a new input in TEZ-1003 and used that so that we can turn on UnionOptimizer by default. Without that seeing lot of performance degradation in production scripts.
Added lot of e2e tests for UnionOptimizer and fixed code based on the issues found.
Fixed couple of other minor issues like
default parallelism not honored
Serializing full store was causing problems with some UDFs on deserialize for checkOutputSpecs.
This patch depends on TEZ-1003. So will check in once that is available as part of tez snapshot in maven.
Diffs (updated)

  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/MultiQueryOptimizerTez.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueInputTez.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueOutputTez.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/optimizers/UnionOptimizer.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/tools/pigstats/tez/TezTaskStats.java 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/tests/nightly.conf 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-MQ-2-OPTOFF.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-MQ-2.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-10-OPTOFF.gld PRE-CREATION
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-10.gld PRE-CREATION
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-2.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-6-OPTOFF.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-6.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-7-OPTOFF.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-7.gld 1588412
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-9-OPTOFF.gld PRE-CREATION
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-9.gld PRE-CREATION
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java 1588412

Diff: https://reviews.apache.org/r/20320/diff/
Testing

Unit tests pass fine and new e2e tests pass fine.
Thanks,

Rohini Palaniswamy