Thnx!
Sent from my iPhone
On Jun 16, 2010, at 2:54 PM, "Aniket Mokashi"
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> This a representation of Pig's physical plan of execution. You can
> read
> more about it at-
>
http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#EXPLAIN>
http://wiki.apache.org/pig/PigExecutionModel>
> 7** are ids for uniquely identifying operators (Logical/Physical/MR)
> in
> pig. [NodeIdGenerator.getNextId()].
>
> As multiple lines in Pig can generate single MapReduce task, it will
> be
> hard to associate this part of the plan with the pig script line
> number.
> But "Explain" can help you more.
>
> Lot of functionality in Pig is implemented with the use of userfunc
> (UDFs).
> Snippet from the code explaining where and why we use IsEmpty UDF-
> <snip>
> public static void addEmptyBagOuterJoin(PhysicalPlan fePlan, Schema
> inputSchema) throws PlanException {
> // we currently have POProject[bag] as the only operator in
> the plan
> // If the bag is an empty bag, we should replace
> // it with a bag with one tuple with null fields so that when
> we
> flatten
> // we do not drop records (flatten will drop records if the
> bag is
> left
> // as an empty bag) and actually project nulls for the fields
> in
> // the empty bag
>
> // So we need to get to the following state:
> // POProject[Bag]
> // \
> // POUserFunc["IsEmpty()"] Const[Bag](bag with null fields)
> // \ | POProject[Bag]
> // \ | /
> // POBinCond
> </snip>
> This explains the use of IsEmpty() UDF.
>
> Hope it helps.
>
> Thanks,
> Aniket
>
> On Wed, June 16, 2010 2:52 pm, Corbin Hoenes wrote:
>> Is there any documentation on how to read this output when I 'set
>> debug
>> on' I get in my reducer syslog:
>>
>> DEBUG:
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
>> $Reduce - New For Each(true,true)[tuple] - 1-770
>> | |
>> | POBinCond[bag] - 1-768
>> | |
>> | |---Project[bag][1] - 1-764
>> | |
>> | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - 1-766
>> | | |
>> | | |---Project[bag][1] - 1-765
>> | |
>> | |---Constant({()}) - 1-767
>> | |
>> | Project[bag][2] - 1-769
>> DEBUG: org.apache.pig.data.InternalCachedBag - Memory can hold 45450
>> records, put the rest in spill file. DEBUG:
>> org.apache.pig.data.InternalCachedBag - Memory can hold 45192
>> records,
>> put the rest in spill file. DEBUG:
>> org.apache.pig.data.InternalCachedBag -
>> Memory can hold 44852 records, put the rest in spill file
>>
>>
>> Specifically what do the 1-7** numbers mean? Is it possible to get
>> line
>> numbers from the pig script :) Also strange is that it seems that
>> POUserFunc is telling me we are running the IsEmpty UDF but that UDF
>> isn't being called in this script at all...is it possible pig is
>> using it
>> under the covers?
>>
>>
>>
>
>