Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16463: PIG-3636 Implement accumulator optimization in Tez

Copy link to this message
Re: Review Request 16463: PIG-3636 Implement accumulator optimization in Tez

This is an automatically generated e-mail. To reply, visit:

(Updated Dec. 27, 2013, 6:31 a.m.)
Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.

Incorporate Rohini's comments-
* Create home dir in Tez mini-cluster.
* Replace Random.nextInt() with UUID.randomUUID() in FileLocalizer. Unless setR(Random) is called, UUID is used to generate temporary paths. But Random and setR(Random) are kept to not break TestMRCompiler. I removed "r = new Random(1331); FileLocalizer.setR(r)" in TezCompiler but didn't in MRCompiler because that breaks TestMRCompiler.

Note that I still copy List<Object> to List<NullableTuple> in POShuffleTezLoad for the reason that I explained in the TODO comment.
Bugs: PIG-3636
Repository: pig-git

The patch implements accumulator optimization in Tez. The changes include-
* Create AccumulatorOptimizer in Tez.
* Create AccumulatorOptimizerUtil class and factor out common functions in MR and Tez.
* Implement accumulator logic in POShuffleTezLoad.
* Update TestAccumulator to make it run in Tez mode.
Diffs (updated)

  shims/test/hadoop23/org/apache/pig/test/TezMiniCluster.java 0b4e7b0
  src/org/apache/pig/PigConfiguration.java 0a26e8c
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java 7f9e15a
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 6e04513
  src/org/apache/pig/backend/hadoop/executionengine/tez/AccumulatorOptimizer.java e69de29
  src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 722b9f6
  src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java 742a33a
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 13a97ca
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java d42ce89
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java c6af682
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java e33a7c6
  src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java e69de29
  src/org/apache/pig/impl/io/FileLocalizer.java f10360b
  test/org/apache/pig/test/TestAccumulator.java b979649
  test/org/apache/pig/test/TestCombiner.java a227d18
  test/tez-tests fcb573e

Diff: https://reviews.apache.org/r/16463/diff/

* TestAccumulator passes in Tez mode.
* All unit tests pass.
* All e2e tests pass.

Note that 3 test cases in TestAccumulator are annotated as @Ignore because SecondaryKeyOptimizer in Tez is not implement yet. The test cases expect accumulator optimizer is applied when order-by and distinct are present in a nested foreash because sort operator is removed by SecondaryKeyOptimizer. Added TODO comments accordingly.

Cheolsoo Park