Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> when Algebraic UDF are used ?


Copy link to this message
-
Re: when Algebraic UDF are used ?
according to: http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html

"Implementing Algebraic does not guarantee that the algebraic
implementation will always be used. Pig only chooses the algebraic
implementation if all UDFs in the same foreach statement are algebraic.
This is because our testing has shown that using the combiner with data
that cannot be combined significantly slows down the job. And there is no
way in Hadoop to route some data to the combiner (for algebraic functions)
and some straight to the reducer (for non-algebraic). This means that your
UDF must always implement the exec method, even if you hope it will always
be used in the algebraic mode. It is also an additional motivation to
implement algebraic for your UDFs when possible."
On Wed, Jul 25, 2012 at 12:32 PM, Benoit Mathieu <[EMAIL PROTECTED]> wrote:

> Hi pig users,
>
> I have coded my own algebraic UDF in Java, and it seems that pig do not use
> the algebraic interface at all. (I put some log messages in my
> Initial,Intermed and Final functions, and they re never logged).
> Pig uses only the main "exec" function.
>
> My UDF needs to get the bag sorted.
> Here is my pig script:
>
> A = LOAD '...' USING PigStorage() AS (k1:int,k2:int,value:int);
> B = GROUP A BY k1;
> C = FOREACH B {
>   tmp = ORDER A.(k2,value) BY k2;
>   GENERATE group, MyUDF(tmp);
> }
> ...
>
>
> Does anyone know why pig does not use the algebraic interface ?
>
> thanks,
>
> Benoit
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB