Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Understaing "Explain" operator


Copy link to this message
-
Re: Understaing "Explain" operator
Alan's book Programming Pig, Ch. 7 has a good section on this. Also
try the -dot opt on
http://pig.apache.org/docs/r0.9.1/test.html#explain as well, to get a
diagram-repr generated.

Which specific part of the output are you having trouble understanding though?

On Tue, Jan 31, 2012 at 3:02 PM, praveenesh kumar <[EMAIL PROTECTED]> wrote:
> Can anyone help me understanding "Explain" Operator in pig ?
>
> I know it gives some logical/physical and Map/Reduce plan for the pig
> script we execute ?
> But its kind of tricky to understand the output of "Explain" operator ?
>
> I know what I am trying to do in Pig. But what I want to know is what
> things I can get by using Explain operator and how can I use the output of
> Explain operator.Can anyone helps me in understanding that ?
>
> Like if I I have the following pig script:
>
> Data = Load 'input.csv' using PigStorage(',');
> IDs = FOREACH Data GENERATE $0;
> UniqueID = Distinct IDs parallel 40;
> Explain IDs;
> Explain UniqueID;
> Dump UniqueID;
>
>
>
>
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> IDs: (Name: LOStore Schema: #4:bytearray)
> |
> |---IDs: (Name: LOForEach Schema: #4:bytearray)
>    |   |
>    |   (Name: LOGenerate[false] Schema:
> #4:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[4]
>    |   |   |
>    |   |   (Name: Project Type: bytearray Uid: 4 Input: 0 Column: (*))
>    |   |
>    |   |---(Name: LOInnerLoad[0] Schema: #4:bytearray)
>    |
>    |---Data: (Name: LOLoad Schema: null)RequiredFields:null
>
> #-----------------------------------------------
> # Physical Plan:
> #-----------------------------------------------
> IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
> |
> |---IDs: New For Each(false)[bag] - scope-3
>    |   |
>    |   Project[bytearray][0] - scope-1
>    |
>    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-0
>
> 2012-01-31 03:25:41,756 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-01-31 03:25:41,773 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-01-31 03:25:41,773 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-5
> Map Plan
> IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
> |
> |---IDs: New For Each(false)[bag] - scope-3
>    |   |
>    |   Project[bytearray][0] - scope-1
>    |
>    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
> scope-0--------
> Global sort: false
> ----------------
>
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> UniqueID: (Name: LOStore Schema: #6:bytearray)
> |
> |---UniqueID: (Name: LODistinct Schema: #6:bytearray)
>    |
>    |---IDs: (Name: LOForEach Schema: #6:bytearray)
>        |   |
>        |   (Name: LOGenerate[false] Schema:
> #6:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[6]
>        |   |   |
>        |   |   (Name: Project Type: bytearray Uid: 6 Input: 0 Column: (*))
>        |   |
>        |   |---(Name: LOInnerLoad[0] Schema: #6:bytearray)
>        |
>        |---Data: (Name: LOLoad Schema: null)RequiredFields:null
>
> #-----------------------------------------------
> # Physical Plan:
> #-----------------------------------------------
> UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
> |
> |---UniqueID: PODistinct[bag] - scope-10
>    |
>    |---IDs: New For Each(false)[bag] - scope-9
>        |   |
>        |   Project[bytearray][0] - scope-7
>        |
>        |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-6

Harsh J
Customer Ops. Engineer, Cloudera