Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Understaing "Explain" operator


Copy link to this message
-
Understaing "Explain" operator
praveenesh kumar 2012-01-31, 09:32
Can anyone help me understanding "Explain" Operator in pig ?

I know it gives some logical/physical and Map/Reduce plan for the pig
script we execute ?
But its kind of tricky to understand the output of "Explain" operator ?

I know what I am trying to do in Pig. But what I want to know is what
things I can get by using Explain operator and how can I use the output of
Explain operator.Can anyone helps me in understanding that ?

Like if I I have the following pig script:

Data = Load 'input.csv' using PigStorage(',');
IDs = FOREACH Data GENERATE $0;
UniqueID = Distinct IDs parallel 40;
Explain IDs;
Explain UniqueID;
Dump UniqueID;
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
IDs: (Name: LOStore Schema: #4:bytearray)
|
|---IDs: (Name: LOForEach Schema: #4:bytearray)
    |   |
    |   (Name: LOGenerate[false] Schema:
#4:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[4]
    |   |   |
    |   |   (Name: Project Type: bytearray Uid: 4 Input: 0 Column: (*))
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: #4:bytearray)
    |
    |---Data: (Name: LOLoad Schema: null)RequiredFields:null

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
|
|---IDs: New For Each(false)[bag] - scope-3
    |   |
    |   Project[bytearray][0] - scope-1
    |
    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-0

2012-01-31 03:25:41,756 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-01-31 03:25:41,773 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-01-31 03:25:41,773 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-5
Map Plan
IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
|
|---IDs: New For Each(false)[bag] - scope-3
    |   |
    |   Project[bytearray][0] - scope-1
    |
    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
scope-0--------
Global sort: false
----------------

#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
UniqueID: (Name: LOStore Schema: #6:bytearray)
|
|---UniqueID: (Name: LODistinct Schema: #6:bytearray)
    |
    |---IDs: (Name: LOForEach Schema: #6:bytearray)
        |   |
        |   (Name: LOGenerate[false] Schema:
#6:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[6]
        |   |   |
        |   |   (Name: Project Type: bytearray Uid: 6 Input: 0 Column: (*))
        |   |
        |   |---(Name: LOInnerLoad[0] Schema: #6:bytearray)
        |
        |---Data: (Name: LOLoad Schema: null)RequiredFields:null

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
|
|---UniqueID: PODistinct[bag] - scope-10
    |
    |---IDs: New For Each(false)[bag] - scope-9
        |   |
        |   Project[bytearray][0] - scope-7
        |
        |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-6

2012-01-31 03:25:41,883 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-01-31 03:25:41,898 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-01-31 03:25:41,898 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
#---# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-12
Map Plan
Local Rearrange[tuple]{tuple}(true) - scope-14
|   |
|   Project[tuple][*] - scope-13
|
|---IDs: New For Each(false)[bag] - scope-9
    |   |
    |   Project[bytearray][0] - scope-7
    |
    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
scope-6--------
Reduce Plan
UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
|
|---New For Each(true)[bag] - scope-17
    |   |
    |   Project[tuple][0] - scope-16
    |
    |---Package[tuple]{tuple} - scope-15--------
Global sort: false
Thanks,
Praveenesh
+
Harsh J 2012-01-31, 17:08