Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request 13950: Tez backend layout


+
Cheolsoo Park 2013-09-03, 21:14
+
Cheolsoo Park 2013-09-04, 01:16
+
Mark Wagner 2013-09-03, 21:48
Copy link to this message
-
Re: Review Request 13950: Tez backend layout
Mark Wagner 2013-09-03, 23:26


> On Sept. 3, 2013, 9:48 p.m., Mark Wagner wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java, line 29
> > <https://reviews.apache.org/r/13950/diff/1/?file=347549#file347549line29>
> >
> >     Do we still need this when we have the DAG api from Tez? It seems strange to wrap Tez things in legacy MR APIs. If this is really needed, is it general enough to be included in the Tez project?
>
> Cheolsoo Park wrote:
>     I kept TezJob (extension of Job) and JobControlCompiler because I thought Pig scripts would generate multiple MR* Tez DAGs, and we need to keep track of dependencies among them by JobControl structure.
>    
>     I guess you're thinking of building a giant DAG out of the entire Pig script. My question is, "Can we connect reduce vertices to mapper vertices using shuffle edges?" For eg, when I have MRR + MRR, can I submit it as a single DAG?
>    
>     Looking at Hive code, it looks like MRR + MRR will be submitted as two separate DAGs. Here is the comment in TezWork.java in Hive:
>    
>      * TezWork. This class encapsulates all the work objects that can be executed
>      * in a single tez job. Currently it's basically a tree with MapWork at the
>      * leaves and and ReduceWork in all other nodes.

I see. Yes, I was thinking of a single pig script turning into a single DAG. I guess with the MRR approach, that's not possible without a way to specify scheduling dependencies in Tez. Instead of engineering around that, I'd prefer that capability be something we request from the Tez team. Scheduling and managing complex runtime dependencies is the goal of Tez, so I think that it's reasonable to be handled there.
- Mark
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13950/#review25860
-----------------------------------------------------------
On Sept. 3, 2013, 9:14 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13950/
> -----------------------------------------------------------
>
> (Updated Sept. 3, 2013, 9:14 p.m.)
>
>
> Review request for pig.
>
>
> Bugs: PIG-3448
>     https://issues.apache.org/jira/browse/PIG-3448
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Adds skeleton classes that I think we need to implement for Tez backend.
>
>
> Diffs
> -----
>
>   build.xml 7e22192
>   ivy.xml aa8f90a
>   ivy/libraries.properties 474edbd
>   src/META-INF/services/org.apache.pig.ExecType 7065767
>   src/org/apache/pig/backend/hadoop/executionengine/tez/DagUtils.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/MRROptimizer.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/MapOper.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/ReduceOper.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecType.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecutionEngine.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOpPlanVisitor.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/13950/diff/
>
>
> Testing
> -------
>
>
> Thanks,
+
Cheolsoo Park 2013-09-03, 22:37