Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request 15931: PIG-3585 Implement union in Tez


+
Cheolsoo Park 2013-12-01, 07:00
Copy link to this message
-
Re: Review Request 15931: PIG-3585 Implement union in Tez
Rohini Palaniswamy 2013-12-01, 17:06

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15931/#review29567
-----------------------------------------------------------

Ship it!
The code is fine if we have union after some processing. But for simple load and union case as below, this will create 3 vertices - 2 load vertices and one union vertex.

a = load 'a'
b = load 'b'
c = union a, b

 In MR, this is handled in a simple map

C: Store(/tmp/tezout:PigStorage) - scope-23
|
|---C: Union[bag] - scope-22
    |
    |---A: New For Each(false,false,false)[bag] - scope-10
    |   |   |
    |   ..........
    |
    |---B: New For Each(false,false,false)[bag] - scope-21
        |   |
        |   .........
        |
        |---B: Load(/tmp/data:org.apache.pig.builtin.PigStorage) - scope-11--------

We should also try do that in a single vertex to be more optimal. We can handle that in a separate jira though.
- Rohini Palaniswamy
On Dec. 1, 2013, 7 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15931/
> -----------------------------------------------------------
>
> (Updated Dec. 1, 2013, 7 a.m.)
>
>
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3585
>     https://issues.apache.org/jira/browse/PIG-3585
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> This patch implements union as follows: load vertices -> broadcast edges -> union vertex.
>
> Th changes include:
> * In the front-end, TezCompiler converts POUnion into a new vertex and connects it to its predecessors with broadcast edges.
> * In the back-end, a new POPackage class called POBroadcastTezLoad is added. This classes implements TezLoad interface, and it pulls every record from ShuffledUnorderedKVInputs in order and unions them.
>
>
> Diffs
> -----
>
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java e49de40
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POBroadcastTezLoad.java e69de29
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 9a2b499
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 529bf30
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java e3f5a5d
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java dcd6a5a
>   test/e2e/pig/tests/tez.conf 7fd5fb1
>
> Diff: https://reviews.apache.org/r/15931/diff/
>
>
> Testing
> -------
>
> * New e2e test case is added.
> * ant test-tez passes.
> * All e2e tests pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>

+
Cheolsoo Park 2013-12-01, 17:25
+
Cheolsoo Park 2013-12-01, 21:11
+
Rohini Palaniswamy 2013-12-01, 19:16