Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 15261: PIG-3555 Initial implementation of Tez combiner optimization


Copy link to this message
-
Re: Review Request 15261: PIG-3555 Initial implementation of Tez combiner optimization
Cheolsoo Park 2013-11-08, 07:37


> On Nov. 8, 2013, 12:55 a.m., Alex Bain wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java, line 63
> > <https://reviews.apache.org/r/15261/diff/2/?file=380233#file380233line63>
> >
> >     In this if-branch, is it not possible for currOp to also end in a POLocalRearrange, i.e. don't you want to check if you should set prevOp=currOp?
> >    
> >     In the case of Map-Reduce-Reduce (implemented as Tez vertices), would not the middle Reduce start with a POPackage but then itself end in a POLocalRearrange?

Alex, you're totally right. I rewrote CombinerOptimizer and use TezOperPlan.getPredecessor() to retrieve predecessor vertices. This way, I can easily detect the pattern across two vertices.
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15261/#review28472
-----------------------------------------------------------
On Nov. 6, 2013, 11:55 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15261/
> -----------------------------------------------------------
>
> (Updated Nov. 6, 2013, 11:55 p.m.)
>
>
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3555
>     https://issues.apache.org/jira/browse/PIG-3555
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Initial implementation of Tez combiner optimizer. The patch includes the following changes-
> * Factored out CombinerOptimizer code into a utility class called CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility class instead of duplicating code.
> * Introduced a new class called TezEdgeDescriptor that holds combine plans as well as various edge properties.
> * Added TezEdgeDescriptors to TezOperator. Note that I added multiple descriptors for inbound edges but a single descriptor for all the outbound edges. This is because TezDagBuilder always creates an edge by connecting predecessors to the current vertex. Please let me know if you think we should allow multiple descriptors for outbound edges too.
> * Refactored some code in TezDagBuilder while touching it.
>
>
> Diffs
> -----
>
>   src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 18a382b
>   src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java e69de29
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0b1f3c9
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 45e47b0
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java e69de29
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 3f14644
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java e612d88
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java 5a42ded
>   src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java e69de29
>   test/org/apache/pig/test/data/GoldenFiles/TEZC1.gld 925f07e
>   test/org/apache/pig/test/data/GoldenFiles/TEZC2.gld a3974fe
>   test/org/apache/pig/test/data/GoldenFiles/TEZC3.gld a8c942b
>   test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld fb7c903
>   test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld e6cd25e
>
> Diff: https://reviews.apache.org/r/15261/diff/
>
>
> Testing
> -------
>
> ant test-tez passes.
> ant test-e2e-tez passes.
>
> I didn't add new test cases, but an e2e test case (Checkin_3) includes an algebraic udf (count) following group-by. I also manually tested it on a live cluster.
>
>
> Thanks,
>
> Cheolsoo Park
>
>