|
|
-
Re: little help on reading debug outputAniket Mokashi 2010-06-16, 20:54
Hi,
This a representation of Pig's physical plan of execution. You can read more about it at- http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#EXPLAIN http://wiki.apache.org/pig/PigExecutionModel 7** are ids for uniquely identifying operators (Logical/Physical/MR) in pig. [NodeIdGenerator.getNextId()]. As multiple lines in Pig can generate single MapReduce task, it will be hard to associate this part of the plan with the pig script line number. But "Explain" can help you more. Lot of functionality in Pig is implemented with the use of userfunc(UDFs). Snippet from the code explaining where and why we use IsEmpty UDF- <snip> public static void addEmptyBagOuterJoin(PhysicalPlan fePlan, Schema inputSchema) throws PlanException { // we currently have POProject[bag] as the only operator in the plan // If the bag is an empty bag, we should replace // it with a bag with one tuple with null fields so that when we flatten // we do not drop records (flatten will drop records if the bag is left // as an empty bag) and actually project nulls for the fields in // the empty bag // So we need to get to the following state: // POProject[Bag] // \ // POUserFunc["IsEmpty()"] Const[Bag](bag with null fields) // \ | POProject[Bag] // \ | / // POBinCond </snip> This explains the use of IsEmpty() UDF. Hope it helps. Thanks, Aniket On Wed, June 16, 2010 2:52 pm, Corbin Hoenes wrote: > Is there any documentation on how to read this output when I 'set debug > on' I get in my reducer syslog: > > DEBUG: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce > $Reduce - New For Each(true,true)[tuple] - 1-770 > | | > | POBinCond[bag] - 1-768 > | | > | |---Project[bag][1] - 1-764 > | | > | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - 1-766 > | | | > | | |---Project[bag][1] - 1-765 > | | > | |---Constant({()}) - 1-767 > | | > | Project[bag][2] - 1-769 > DEBUG: org.apache.pig.data.InternalCachedBag - Memory can hold 45450 > records, put the rest in spill file. DEBUG: > org.apache.pig.data.InternalCachedBag - Memory can hold 45192 records, > put the rest in spill file. DEBUG: org.apache.pig.data.InternalCachedBag - > Memory can hold 44852 records, put the rest in spill file > > > Specifically what do the 1-7** numbers mean? Is it possible to get line > numbers from the pig script :) Also strange is that it seems that > POUserFunc is telling me we are running the IsEmpty UDF but that UDF > isn't being called in this script at all...is it possible pig is using it > under the covers? > > > |