|
|
-
Re: little help on reading debug outputCorbin Hoenes 2010-06-17, 00:23
Thnx!
Sent from my iPhone On Jun 16, 2010, at 2:54 PM, "Aniket Mokashi" <[EMAIL PROTECTED]> wrote: > Hi, > > This a representation of Pig's physical plan of execution. You can > read > more about it at- > http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#EXPLAIN > http://wiki.apache.org/pig/PigExecutionModel > > 7** are ids for uniquely identifying operators (Logical/Physical/MR) > in > pig. [NodeIdGenerator.getNextId()]. > > As multiple lines in Pig can generate single MapReduce task, it will > be > hard to associate this part of the plan with the pig script line > number. > But "Explain" can help you more. > > Lot of functionality in Pig is implemented with the use of userfunc > (UDFs). > Snippet from the code explaining where and why we use IsEmpty UDF- > <snip> > public static void addEmptyBagOuterJoin(PhysicalPlan fePlan, Schema > inputSchema) throws PlanException { > // we currently have POProject[bag] as the only operator in > the plan > // If the bag is an empty bag, we should replace > // it with a bag with one tuple with null fields so that when > we > flatten > // we do not drop records (flatten will drop records if the > bag is > left > // as an empty bag) and actually project nulls for the fields > in > // the empty bag > > // So we need to get to the following state: > // POProject[Bag] > // \ > // POUserFunc["IsEmpty()"] Const[Bag](bag with null fields) > // \ | POProject[Bag] > // \ | / > // POBinCond > </snip> > This explains the use of IsEmpty() UDF. > > Hope it helps. > > Thanks, > Aniket > > On Wed, June 16, 2010 2:52 pm, Corbin Hoenes wrote: >> Is there any documentation on how to read this output when I 'set >> debug >> on' I get in my reducer syslog: >> >> DEBUG: >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce >> $Reduce - New For Each(true,true)[tuple] - 1-770 >> | | >> | POBinCond[bag] - 1-768 >> | | >> | |---Project[bag][1] - 1-764 >> | | >> | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - 1-766 >> | | | >> | | |---Project[bag][1] - 1-765 >> | | >> | |---Constant({()}) - 1-767 >> | | >> | Project[bag][2] - 1-769 >> DEBUG: org.apache.pig.data.InternalCachedBag - Memory can hold 45450 >> records, put the rest in spill file. DEBUG: >> org.apache.pig.data.InternalCachedBag - Memory can hold 45192 >> records, >> put the rest in spill file. DEBUG: >> org.apache.pig.data.InternalCachedBag - >> Memory can hold 44852 records, put the rest in spill file >> >> >> Specifically what do the 1-7** numbers mean? Is it possible to get >> line >> numbers from the pig script :) Also strange is that it seems that >> POUserFunc is telling me we are running the IsEmpty UDF but that UDF >> isn't being called in this script at all...is it possible pig is >> using it >> under the covers? >> >> >> > > |