Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Understaing "Explain" operator


Copy link to this message
-
Re: Understaing "Explain" operator
Alan's book Programming Pig, Ch. 7 has a good section on this. Also
try the -dot opt on
http://pig.apache.org/docs/r0.9.1/test.html#explain as well, to get a
diagram-repr generated.

Which specific part of the output are you having trouble understanding though?

On Tue, Jan 31, 2012 at 3:02 PM, praveenesh kumar <[EMAIL PROTECTED]> wrote:
> Can anyone help me understanding "Explain" Operator in pig ?
>
> I know it gives some logical/physical and Map/Reduce plan for the pig
> script we execute ?
> But its kind of tricky to understand the output of "Explain" operator ?
>
> I know what I am trying to do in Pig. But what I want to know is what
> things I can get by using Explain operator and how can I use the output of
> Explain operator.Can anyone helps me in understanding that ?
>
> Like if I I have the following pig script:
>
> Data = Load 'input.csv' using PigStorage(',');
> IDs = FOREACH Data GENERATE $0;
> UniqueID = Distinct IDs parallel 40;
> Explain IDs;
> Explain UniqueID;
> Dump UniqueID;
>
>
>
>
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> IDs: (Name: LOStore Schema: #4:bytearray)
> |
> |---IDs: (Name: LOForEach Schema: #4:bytearray)
>    |   |
>    |   (Name: LOGenerate[false] Schema:
> #4:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[4]
>    |   |   |
>    |   |   (Name: Project Type: bytearray Uid: 4 Input: 0 Column: (*))
>    |   |
>    |   |---(Name: LOInnerLoad[0] Schema: #4:bytearray)
>    |
>    |---Data: (Name: LOLoad Schema: null)RequiredFields:null
>
> #-----------------------------------------------
> # Physical Plan:
> #-----------------------------------------------
> IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
> |
> |---IDs: New For Each(false)[bag] - scope-3
>    |   |
>    |   Project[bytearray][0] - scope-1
>    |
>    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-0
>
> 2012-01-31 03:25:41,756 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-01-31 03:25:41,773 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-01-31 03:25:41,773 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-5
> Map Plan
> IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
> |
> |---IDs: New For Each(false)[bag] - scope-3
>    |   |
>    |   Project[bytearray][0] - scope-1
>    |
>    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
> scope-0--------
> Global sort: false
> ----------------
>
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> UniqueID: (Name: LOStore Schema: #6:bytearray)
> |
> |---UniqueID: (Name: LODistinct Schema: #6:bytearray)
>    |
>    |---IDs: (Name: LOForEach Schema: #6:bytearray)
>        |   |
>        |   (Name: LOGenerate[false] Schema:
> #6:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[6]
>        |   |   |
>        |   |   (Name: Project Type: bytearray Uid: 6 Input: 0 Column: (*))
>        |   |
>        |   |---(Name: LOInnerLoad[0] Schema: #6:bytearray)
>        |
>        |---Data: (Name: LOLoad Schema: null)RequiredFields:null
>
> #-----------------------------------------------
> # Physical Plan:
> #-----------------------------------------------
> UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
> |
> |---UniqueID: PODistinct[bag] - scope-10
>    |
>    |---IDs: New For Each(false)[bag] - scope-9
>        |   |
>        |   Project[bytearray][0] - scope-7
>        |
>        |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-6

Harsh J
Customer Ops. Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB