Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 15194: Support multiple inputs for PigProcessor


Copy link to this message
-
Review Request 15194: Support multiple inputs for PigProcessor

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15194/
-----------------------------------------------------------

Review request for pig, Cheolsoo Park and Daniel Dai.
Bugs: PIG-3527
    https://issues.apache.org/jira/browse/PIG-3527
Repository: pig-git
Description
-------

Adds support for multiple LogicalInputs to the PigProcessor. This is done by adding a new TezLoad interface which PhysicalOperators may implement. On the backend, any operators implementing this interface will have the LogicalInput attached to them. 2 implementations are included:
* POSimpleTezLoad which consumes a single MRInput
* POShuffleTezLoad which consumes one or more ShuffledMergedInputs.
The POShuffleTezLoad does a k-way merge of the shuffle inputs to package for the operator pipeline. This required a change to the comparators used so that the sort order remained consistent. There is also a fix to POForEach where it was using the incorrect status code for signaling (although it produced the same end result in the MR pipeline).
Diffs
-----

  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigDecimalRawComparator.java ddea99e
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBigIntegerRawComparator.java 5ea3fc7
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBooleanRawComparator.java dfd4ebf
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBytesRawComparator.java 09397e5
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigDateTimeRawComparator.java a87161f
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigDoubleRawComparator.java cbf457f
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigFloatRawComparator.java 1d86e3f
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigIntRawComparator.java bb6c9df
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigLongRawComparator.java b3ded76
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigSecondaryKeyComparator.java 5ad334b
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTextRawComparator.java 022f37b
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 866c39d
  src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleSortComparator.java 9724b9f
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/POSimpleTezLoad.java PRE-CREATION
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/TezLoad.java PRE-CREATION
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java eb9f62a
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 86314d9
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackageLite.java c200715
  src/org/apache/pig/backend/hadoop/executionengine/tez/FileInputHandler.java d29e330
  src/org/apache/pig/backend/hadoop/executionengine/tez/InputHandler.java d2298ca
  src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java PRE-CREATION
  src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java ebb3145
  src/org/apache/pig/backend/hadoop/executionengine/tez/ShuffledInputHandler.java d7b42b8
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 45e47b0
  src/org/apache/pig/data/BinInterSedes.java b3ec51e
  src/org/apache/pig/data/DefaultTuple.java 2e7ca5f
  test/e2e/pig/tests/tez.conf 24af8d3

Diff: https://reviews.apache.org/r/15194/diff/
Testing
-------

Manual testing and an e2e test has been added. Because of the comparator change, some of the tests fail because of bag ordering.
Thanks,

Mark Wagner