Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> little help on reading debug output


Copy link to this message
-
Re: little help on reading debug output
Thnx!

Sent from my iPhone

On Jun 16, 2010, at 2:54 PM, "Aniket Mokashi"  
<[EMAIL PROTECTED]> wrote:

> Hi,
>
> This a representation of Pig's physical plan of execution. You can  
> read
> more about it at-
> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#EXPLAIN
> http://wiki.apache.org/pig/PigExecutionModel
>
> 7** are ids for uniquely identifying operators (Logical/Physical/MR)  
> in
> pig. [NodeIdGenerator.getNextId()].
>
> As multiple lines in Pig can generate single MapReduce task, it will  
> be
> hard to associate this part of the plan with the pig script line  
> number.
> But "Explain" can help you more.
>
> Lot of functionality in Pig is implemented with the use of userfunc
> (UDFs).
> Snippet from the code explaining where and why we use IsEmpty UDF-
> <snip>
> public static void addEmptyBagOuterJoin(PhysicalPlan fePlan, Schema
> inputSchema) throws PlanException {
>        // we currently have POProject[bag] as the only operator in  
> the plan
>        // If the bag is an empty bag, we should replace
>        // it with a bag with one tuple with null fields so that when  
> we
> flatten
>        // we do not drop records (flatten will drop records if the  
> bag is
> left
>        // as an empty bag) and actually project nulls for the fields  
> in
>        // the empty bag
>
>        // So we need to get to the following state:
>        // POProject[Bag]
>        //         \
>        //    POUserFunc["IsEmpty()"] Const[Bag](bag with null fields)
>        //                        \      |    POProject[Bag]
>        //                         \     |    /
>        //                          POBinCond
> </snip>
> This explains the use of IsEmpty() UDF.
>
> Hope it helps.
>
> Thanks,
> Aniket
>
> On Wed, June 16, 2010 2:52 pm, Corbin Hoenes wrote:
>> Is there any documentation on how to read this output when I 'set  
>> debug
>> on' I get in my reducer syslog:
>>
>> DEBUG:
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
 

>> $Reduce - New For Each(true,true)[tuple] - 1-770
>> |   |
>> |   POBinCond[bag] - 1-768
>> |   |
>> |   |---Project[bag][1] - 1-764
>> |   |
>> |   |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - 1-766
>> |   |   |
>> |   |   |---Project[bag][1] - 1-765
>> |   |
>> |   |---Constant({()}) - 1-767
>> |   |
>> |   Project[bag][2] - 1-769
>> DEBUG: org.apache.pig.data.InternalCachedBag - Memory can hold 45450
>> records, put the rest in spill file. DEBUG:
>> org.apache.pig.data.InternalCachedBag - Memory can hold 45192  
>> records,
>> put the rest in spill file. DEBUG:  
>> org.apache.pig.data.InternalCachedBag -
>> Memory can hold 44852 records, put the rest in spill file
>>
>>
>> Specifically what do the 1-7** numbers mean?  Is it possible to get  
>> line
>> numbers from the pig script :) Also strange is that it seems that
>> POUserFunc is telling me we are running the IsEmpty UDF but that UDF
>> isn't being called in this script at all...is it possible pig is  
>> using it
>> under the covers?
>>
>>
>>
>
>