Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request: disable optimizations via pig properties


Copy link to this message
-
Re: Review Request: disable optimizations via pig properties
Travis Crawford 2013-05-14, 17:23

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11032/
-----------------------------------------------------------

(Updated May 14, 2013, 5:23 p.m.)
Review request for pig, Julien Le Dem, Bill Graham, and Feng Peng.
Changes
-------

I believe this patch addresses all the issues raised during review. Thanks for the suggestions!

* Docs have been updated to clarify both methods available for disabling optimizations. When introducing pig.optimizer.rules.disabled we now link to http://pig.apache.org/docs/r0.11.1/start.html#properties which discusses how properties are initialized.

* Docs have been updated to mention FilterLogicExpressionSimplifier is an exception, since the rule is disabled by default. No related code changes - we just document existing behavior.

* PigConstants has been annotated as public to further clarify those are intended for use by pig's users.

* A new org.apache.pig.impl.PigImplConstants class has been created to store internal constants. It has the appropriate @InterfaceAudience annotation, and a javadoc link to PigConstants in case someone comes looking for public constants.

* I did not add a comment to Preconditions.checkArgument(ruleSet != null); because I think its pretty clear as-is, but if you feel strongly about this I can add the comment.
Description
-------

Update pig to allow disabling optimizations via pig properties. Currently optimizations must be disabled via command-line options. Pig properties can be set in pig.properties, "set" commands in scripts themselves, and command-line -D options.

The use-case is, for scripts that require certain optimizations to be disabled, allowing the script itself to disable the optimization. Currently whatever runs the script needs to specially handle disabling the optimization for that specific query.
This addresses bug PIG-3317.
    https://issues.apache.org/jira/browse/PIG-3317
Diffs (updated)
-----

  src/docs/src/documentation/content/xdocs/perf.xml 108ae7e
  src/org/apache/pig/Main.java f97ed9f
  src/org/apache/pig/PigConstants.java ea77e97
  src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 4dab4e8
  src/org/apache/pig/impl/PigImplConstants.java PRE-CREATION
  src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java d26f381
  test/org/apache/pig/test/TestEvalPipeline2.java 39cf807

Diff: https://reviews.apache.org/r/11032/diff/
Testing
-------

Manually tested on a fully-distributed cluster.

THIS FAILS:
PIG_CONF_DIR=/etc/pig/conf ./bin/pig -c query.pig

THIS WORKS:
PIG_CONF_DIR=/etc/pig/conf ./bin/pig -Dpig.optimizer.rules.disabled=ColumnMapKeyPrune -c query.pig

Notice how "-Dpig.optimizer.rules.disabled=ColumnMapKeyPrune" specifies a pig property, which could be in pig.properties, or the script itself.
Failure message:

Pig Stack Trace
---------------
ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 97550 Input: 0 Column: 1)

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null
at org.apache.pig.PigServer.explain(PigServer.java:1057)
at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:419)
at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:351)
at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
at org.apache.pig.Main.run(Main.java:607)
at org.apache.pig.Main.main(Main.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune
at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:281)
at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
at org.apache.pig.PigServer.explain(PigServer.java:1042)
... 10 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 97550 Input: 0 Column: 1)
at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91)
at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:142)
at org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:124)
at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher.transformed(ProjectionPatcher.java:48)
at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
... 13 more
Thanks,

Travis Crawford