Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Understaing "Explain" operator


Copy link to this message
-
Understaing "Explain" operator
Can anyone help me understanding "Explain" Operator in pig ?

I know it gives some logical/physical and Map/Reduce plan for the pig
script we execute ?
But its kind of tricky to understand the output of "Explain" operator ?

I know what I am trying to do in Pig. But what I want to know is what
things I can get by using Explain operator and how can I use the output of
Explain operator.Can anyone helps me in understanding that ?

Like if I I have the following pig script:

Data = Load 'input.csv' using PigStorage(',');
IDs = FOREACH Data GENERATE $0;
UniqueID = Distinct IDs parallel 40;
Explain IDs;
Explain UniqueID;
Dump UniqueID;
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
IDs: (Name: LOStore Schema: #4:bytearray)
|
|---IDs: (Name: LOForEach Schema: #4:bytearray)
    |   |
    |   (Name: LOGenerate[false] Schema:
#4:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[4]
    |   |   |
    |   |   (Name: Project Type: bytearray Uid: 4 Input: 0 Column: (*))
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: #4:bytearray)
    |
    |---Data: (Name: LOLoad Schema: null)RequiredFields:null

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
|
|---IDs: New For Each(false)[bag] - scope-3
    |   |
    |   Project[bytearray][0] - scope-1
    |
    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-0

2012-01-31 03:25:41,756 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-01-31 03:25:41,773 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-01-31 03:25:41,773 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-5
Map Plan
IDs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-4
|
|---IDs: New For Each(false)[bag] - scope-3
    |   |
    |   Project[bytearray][0] - scope-1
    |
    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
scope-0--------
Global sort: false
----------------

#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
UniqueID: (Name: LOStore Schema: #6:bytearray)
|
|---UniqueID: (Name: LODistinct Schema: #6:bytearray)
    |
    |---IDs: (Name: LOForEach Schema: #6:bytearray)
        |   |
        |   (Name: LOGenerate[false] Schema:
#6:bytearray)ColumnPrune:InputUids=[]ColumnPrune:OutputUids=[6]
        |   |   |
        |   |   (Name: Project Type: bytearray Uid: 6 Input: 0 Column: (*))
        |   |
        |   |---(Name: LOInnerLoad[0] Schema: #6:bytearray)
        |
        |---Data: (Name: LOLoad Schema: null)RequiredFields:null

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
|
|---UniqueID: PODistinct[bag] - scope-10
    |
    |---IDs: New For Each(false)[bag] - scope-9
        |   |
        |   Project[bytearray][0] - scope-7
        |
        |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) - scope-6

2012-01-31 03:25:41,883 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-01-31 03:25:41,898 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-01-31 03:25:41,898 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
#---# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-12
Map Plan
Local Rearrange[tuple]{tuple}(true) - scope-14
|   |
|   Project[tuple][*] - scope-13
|
|---IDs: New For Each(false)[bag] - scope-9
    |   |
    |   Project[bytearray][0] - scope-7
    |
    |---Data: Load(/AllStateInputs/input.csv:PigStorage(',')) -
scope-6--------
Reduce Plan
UniqueID: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-11
|
|---New For Each(true)[bag] - scope-17
    |   |
    |   Project[tuple][0] - scope-16
    |
    |---Package[tuple]{tuple} - scope-15--------
Global sort: false
Thanks,
Praveenesh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB