

Re: when Algebraic UDF are used ?
according to: http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
"Implementing Algebraic does not guarantee that the algebraic implementation will always be used. Pig only chooses the algebraic implementation if all UDFs in the same foreach statement are algebraic. This is because our testing has shown that using the combiner with data that cannot be combined significantly slows down the job. And there is no way in Hadoop to route some data to the combiner (for algebraic functions) and some straight to the reducer (for nonalgebraic). This means that your UDF must always implement the exec method, even if you hope it will always be used in the algebraic mode. It is also an additional motivation to implement algebraic for your UDFs when possible." On Wed, Jul 25, 2012 at 12:32 PM, Benoit Mathieu <[EMAIL PROTECTED]> wrote: > Hi pig users, > > I have coded my own algebraic UDF in Java, and it seems that pig do not use > the algebraic interface at all. (I put some log messages in my > Initial,Intermed and Final functions, and they re never logged). > Pig uses only the main "exec" function. > > My UDF needs to get the bag sorted. > Here is my pig script: > > A = LOAD '...' USING PigStorage() AS (k1:int,k2:int,value:int); > B = GROUP A BY k1; > C = FOREACH B { > tmp = ORDER A.(k2,value) BY k2; > GENERATE group, MyUDF(tmp); > } > ... > > > Does anyone know why pig does not use the algebraic interface ? > > thanks, > > Benoit > 

