Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Review Request 16309: PIG-3629 Implement STREAM operator in Tez


+
Alex Bain 2013-12-18, 03:53
Copy link to this message
-
Re: Review Request 16309: PIG-3629 Implement STREAM operator in Tez

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16309/#review30644
-----------------------------------------------------------
Looks good. Can you add a few e2e tests to it?

- Daniel Dai
On Dec. 18, 2013, 3:53 a.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16309/
> -----------------------------------------------------------
>
> (Updated Dec. 18, 2013, 3:53 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3629
>     https://issues.apache.org/jira/browse/PIG-3629
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement STREAM operator in Tez - https://issues.apache.org/jira/browse/PIG-3629
>
> In this patch, I do not add resources to pig-misc.jar, I just add them individually. See my discussion post: https://groups.google.com/forum/#!topic/pig-on-tez/8S80GMKhMaU
>
> Basic Changes:
> -Run the PhyPlanSetter and EndOfAllInputSetter to set the parent plan and the end-of-all input flags necessary for STREAM, just like in MR Pig.
> -Add a map to hold plan-specific extra local resources in TezOperPlan.java. These resources can either come from the user's directory (e.g. SHIP('/home/abain/foo')) or from HDFS (e.g. CACHE('/user/abain/bar') in HDFS).
> -Add the new class TezPOStreamVisitor that assembles all the plan-specific local resources that get added in TezOperPlan.java.
>
> Resource Manager Changes:
> -TezResourcManager resources were previously a map of java.net.URL -> Path in HDFS. Previously, the URL's were all local files, e.g. file://home/abain/pig-withouthHadoop.jar. However, the CACHE statement requires that resources already present in HDFS be able to be added as local resources. Unfortunately java.net.URL does not support hdfs:// URL's, so I changed this data structure to be a YARN URL instead. I also added methods to the ResourceManager to distinguish whether you are adding a local resource or a resource already present in HDFS.
> -CACHE also supports URL's with fragments at the end, which become a "shortcut" to the name, e.g. CACHE(/input/big-data-name.gz#data.gz). I changed the resource manager to look for a fragments and use that as the resource name (if the fragment exist). This results in the symlink to the resource being created with the fragment name, which is what we want.
>
>
> Diffs
> -----
>
>   src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java 37566ab
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 7a1736a
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 2584501
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 96ccdde
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java 135b933
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java 0cc8e17
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java PRE-CREATION
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java 673fd70
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java 0fd7575
>
> Diff: https://reviews.apache.org/r/16309/diff/
>
>
> Testing
> -------
>
> Added a unit test to TestTezCompiler.java
> Added an e2e test to tez.conf
>
> ant test-tez passes
> ant test-e2e-tez has three failures - I am investigating to see if they are releated, or perhaps just transient
>
> Question: There is already a separate suite of STREAM tests in streaming.conf. Maybe I should remove my e2e test and we should add streaming.conf as a dependency to the test-e2e-tez target? I haven't tried to run streaming.conf yet.
>
>
> Thanks,
>
> Alex Bain
>
>

+
Alex Bain 2013-12-24, 01:34
+
Rohini Palaniswamy 2013-12-26, 18:08
+
Alex Bain 2014-01-10, 22:34
+
Cheolsoo Park 2013-12-24, 03:42
+
Cheolsoo Park 2013-12-24, 05:32
+
Alex Bain 2014-01-10, 22:35
+
Rohini Palaniswamy 2014-02-16, 23:44