Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 16463: PIG-3636 Implement accumulator optimization in Tez


Copy link to this message
-
Re: Review Request 16463: PIG-3636 Implement accumulator optimization in Tez
Rohini Palaniswamy 2013-12-27, 06:41

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16463/#review30905
-----------------------------------------------------------

Ship it!
Ship It!

- Rohini Palaniswamy
On Dec. 27, 2013, 6:31 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16463/
> -----------------------------------------------------------
>
> (Updated Dec. 27, 2013, 6:31 a.m.)
>
>
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3636
>     https://issues.apache.org/jira/browse/PIG-3636
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> The patch implements accumulator optimization in Tez. The changes include-
> * Create AccumulatorOptimizer in Tez.
> * Create AccumulatorOptimizerUtil class and factor out common functions in MR and Tez.
> * Implement accumulator logic in POShuffleTezLoad.
> * Update TestAccumulator to make it run in Tez mode.
>
>
> Diffs
> -----
>
>   shims/test/hadoop23/org/apache/pig/test/TezMiniCluster.java 0b4e7b0
>   src/org/apache/pig/PigConfiguration.java 0a26e8c
>   src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java 7f9e15a
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 6e04513
>   src/org/apache/pig/backend/hadoop/executionengine/tez/AccumulatorOptimizer.java e69de29
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 722b9f6
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java 742a33a
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 13a97ca
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java d42ce89
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java c6af682
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java e33a7c6
>   src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java e69de29
>   src/org/apache/pig/impl/io/FileLocalizer.java f10360b
>   test/org/apache/pig/test/TestAccumulator.java b979649
>   test/org/apache/pig/test/TestCombiner.java a227d18
>   test/tez-tests fcb573e
>
> Diff: https://reviews.apache.org/r/16463/diff/
>
>
> Testing
> -------
>
> * TestAccumulator passes in Tez mode.
> * All unit tests pass.
> * All e2e tests pass.
>
> Note that 3 test cases in TestAccumulator are annotated as @Ignore because SecondaryKeyOptimizer in Tez is not implement yet. The test cases expect accumulator optimizer is applied when order-by and distinct are present in a nested foreash because sort operator is removed by SecondaryKeyOptimizer. Added TODO comments accordingly.
>
>
> Thanks,
>
> Cheolsoo Park
>
>