Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Review Request 15261: PIG-3555 Initial implementation of Tez combiner optimization


+
Cheolsoo Park 2013-11-06, 11:04
+
Cheolsoo Park 2013-11-06, 17:00
+
Cheolsoo Park 2013-11-06, 23:55
+
Alex Bain 2013-11-08, 00:55
+
Cheolsoo Park 2013-11-08, 07:37
+
Cheolsoo Park 2013-11-08, 07:57
+
Rohini Palaniswamy 2013-11-08, 23:12
+
Cheolsoo Park 2013-11-09, 00:33
+
Cheolsoo Park 2013-11-10, 04:31
+
Mark Wagner 2013-11-11, 21:16
+
Cheolsoo Park 2013-11-11, 21:41
+
Mark Wagner 2013-11-06, 22:54
Copy link to this message
-
Re: Review Request 15261: PIG-3555 Initial implementation of Tez combiner optimization


> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, line 180
> > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line180>
> >
> >     Does this work properly? I was thinking that the key class, comparator, etc would also need to be a part of this conf. They're not right now but I thought tez was falling back if no edge conf was present

The key class, comparator, etc are added to the payload of edges by setIntermediateInputKeyValue() and setIntermediateOutputKeyValue() calls in addCombiner().
> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java, line 76
> > <https://reviews.apache.org/r/15261/diff/1/?file=379007#file379007line76>
> >
> >     Does this still hold true for tez? What about a load + group bys on different keys?

True. Can we punt this for now?
> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, lines 306-313
> > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line306>
> >
> >     Move to edge creation

Right now, the vertex pipeline still has POPackage in plan, so setting input keys in the payload of vertex is necessary.
> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, line 297
> > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line297>
> >
> >     This should go with the rest of the edge creation. Anything that's related to shuffles or input/outputs needs to be done per edge.

Now I create a new configuration object per edge and set up input/output keys there. However, I still need to set input/output keys in vertex because I do not eliminate local rearrange and package from its pipeline. Please correct me if I am misunderstanding.
> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java, line 39
> > <https://reviews.apache.org/r/15261/diff/1/?file=379000#file379000line39>
> >
> >     I think a more global view of things is needed here. For example, I think a multi-parent node will have trouble. This could show up with a group by + join on the same key, where the group by may need a combiner.
> >    
> >     It may be necessary to add more information when the TezOps are constructed.

I actually solved this in a different way. Instead of keeping the reference to the previously visited vertex, I use TezOperPlan (parent plan of each TezOperator) to retrieve predecessors while visiting a vertex. So now I can handle the multi-parent node case nicely.
> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, line 294
> > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line294>
> >
> >     We can remove this since the work I'm referencing is essentially this JIRA
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15261/#review28317
-----------------------------------------------------------
On Nov. 6, 2013, 11:55 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15261/
> -----------------------------------------------------------
>
> (Updated Nov. 6, 2013, 11:55 p.m.)
>
>
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3555
>     https://issues.apache.org/jira/browse/PIG-3555
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Initial implementation of Tez combiner optimizer. The patch includes the following changes-
> * Factored out CombinerOptimizer code into a utility class called CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility class instead of duplicating code.
+
Rohini Palaniswamy 2013-11-06, 19:46
+
Rohini Palaniswamy 2013-11-06, 22:15
+
Cheolsoo Park 2013-11-06, 20:55