Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Review Request 15634: PIG-3525 PigStats.get() and ScriptState.get() shouldn't return MR-specific objects


Copy link to this message
-
Review Request 15634: PIG-3525 PigStats.get() and ScriptState.get() shouldn't return MR-specific objects

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15634/
-----------------------------------------------------------

Review request for pig, Aniket Mokashi, Daniel Dai, and Rohini Palaniswamy.
Bugs: PIG-3525
    https://issues.apache.org/jira/browse/PIG-3525
Repository: pig-git
Description
-------

This is my 2nd attempt to fix PIG-3525. Please review this patch and ignore the previous one.

Problem:
PigStats and ScriptState are thread local variables that are accessible via PigStats.get() and ScriptState.get(). Currently, these get() functions can return MR-specific objects regardless of the exec type (MR or non-MR).

Fix:
I am introducing the following contract for PigStats and ScriptState:
* PigStats.start(PigStats) and ScriptState.start(ScriptState) must be called before any calls to PigStats.get() and ScriptState.get() are made.
* If not, PigStats.get() and ScriptState.get() will return null since PigStats and ScriptState objects are not initialized.

In addition, I add the following lines to the PigServer constructor:

-----
if (PigStats.get() == null) {
    PigStats.start(pigContext.getExecutionEngine().instantiatePigStats());
}

if (ScriptState.get() == null) {
    ScriptState.start(pigContext.getExecutionEngine().instantiateScriptState());
}
-----

This initializes PigStats and ScriptState at start-up for both interactive and batch modes. The only exception is embedded mode where the main PigStats is set to EmbeddedPigStats in Main class. EmbeddedPigStats is really a collection of PigStats objects, and each running thread creates its own PigStats object in BoundScript class.

Note that I decided to not infer the object type in the get() functions because that requires access to PigContext, resulting in a lot of code changes.
Diffs
-----

  src/org/apache/pig/Main.java 9ba1b86
  src/org/apache/pig/PigServer.java c0826ea
  src/org/apache/pig/backend/executionengine/ExecutionEngine.java 56a958b
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRExecutionEngine.java ddc89f9
  src/org/apache/pig/scripting/BoundScript.java 28340d1
  src/org/apache/pig/tools/pigstats/PigStats.java ff0d0f9
  src/org/apache/pig/tools/pigstats/PigStatsUtil.java 84c52de
  src/org/apache/pig/tools/pigstats/ScriptState.java e3a2ba2
  test/org/apache/pig/test/TestJobControlCompiler.java 1ef5b4d
  test/org/apache/pig/test/TestPOPartialAggPlan.java f6e84d1
  test/org/apache/pig/test/TestPlanGeneration.java 506857f
  test/org/apache/pig/test/TestScriptLanguage.java cb7eb9d

Diff: https://reviews.apache.org/r/15634/diff/
Testing
-------

I ran all the unit tests and fixed the following tests-
* TestJobControlCompiler
* TestPOPartialAggPlan
* TestPlanGeneration
* TestScriptLanguage

There were two reasons why these tests failed-
* PigStats and ScriptState were never initialized, and thus, PigStats.get() and ScriptState.get() caused NPEs. I fixed this by explicitly creating a PigServer object.
* PigServer was created by JUnit before, and PigStats was set to SimplePigStats instead of EmbeddedPigStats in embedded mode. I fixed this by moving the call to PigServer constructor out of JUnit before.
Thanks,

Cheolsoo Park

+
Rohini Palaniswamy 2013-11-18, 02:02