Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 15261: PIG-3555 Initial implementation of Tez combiner optimization


Copy link to this message
-
Re: Review Request 15261: PIG-3555 Initial implementation of Tez combiner optimization

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15261/#review28317
-----------------------------------------------------------
This looks like a good general approach. I think it would help if a bit more state was maintained in the TezOp, especially during the optimization. For the MR engine, a lot of assumptions were made that may not remain true anymore, so those should all be double checked.
src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java
<https://reviews.apache.org/r/15261/#comment55130>

    I think a more global view of things is needed here. For example, I think a multi-parent node will have trouble. This could show up with a group by + join on the same key, where the group by may need a combiner.
    
    It may be necessary to add more information when the TezOps are constructed.

src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java
<https://reviews.apache.org/r/15261/#comment55131>

    Joins will have POPackages with multiple inputs.

src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java
<https://reviews.apache.org/r/15261/#comment55132>

    PlanHelper.getPhysicalOperators()? I think a LocalRearrange will always be a leaf.

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55126>

    The type of Input/Output should be part of the  TezEdgeDescriptor I think. Later we'll want to do different edges

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55128>

    Does this work properly? I was thinking that the key class, comparator, etc would also need to be a part of this conf. They're not right now but I thought tez was falling back if no edge conf was present

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55137>

    We can remove this since the work I'm referencing is essentially this JIRA

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55135>

    This should go with the rest of the edge creation. Anything that's related to shuffles or input/outputs needs to be done per edge.

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55138>

    Remove :)

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55139>

    Move to edge creation

src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/15261/#comment55140>

    plans -> plan

src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
<https://reviews.apache.org/r/15261/#comment55143>

    What's the benefit of this? It seems best to store edges on both ends of the vertex for ease of retrieval.

src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
<https://reviews.apache.org/r/15261/#comment55146>

    Does this still hold true for tez? What about a load + group bys on different keys?

src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
<https://reviews.apache.org/r/15261/#comment55147>

    s/leaf/root/
    
    Although, does this also still hold?
- Mark Wagner
On Nov. 6, 2013, 11:04 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15261/
> -----------------------------------------------------------
>
> (Updated Nov. 6, 2013, 11:04 a.m.)
>
>
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3555
>     https://issues.apache.org/jira/browse/PIG-3555
>