


Function To Compute Product of Values in Bag
I'm creating a multinomial naive bayes classifier using pig and need to compute the product of probabilities. There are an arbitrary number of values in the bag so I would like to be able to use a function similar to the builtin SUM to do this. I looked through the source code and found that with some really simple changes to SUM.java I can create a PROD.java function. I included it in my piggybank and have been using it successfully.
I was curious what the community thought about including this function as a builtin function in a future release? Or would it make more sense to keep this function as a udf in a piggybank.
Thanks, Sergey
+
Sergey Goder 20130503, 18:20

Re: Function To Compute Product of Values in Bag
Hi,
Just a hint: It's usually better to work with log probabilites and sum over them, than to work with raw probabilities and to use multiplication. You might easily run into numerical accuracy issues otherwise.
i.e. exploit this fact:
product(x1, ..., xn) = exp(sum(log(x1), ..., log(xn)))
best,
Kai Londenberg
2013/5/3 Sergey Goder <[EMAIL PROTECTED]>: > I'm creating a multinomial naive bayes classifier using pig and need to > compute the product of probabilities. There are an arbitrary number of > values in the bag so I would like to be able to use a function similar to > the builtin SUM to do this. I looked through the source code and found that > with some really simple changes to SUM.java I can create a PROD.java > function. I included it in my piggybank and have been using it successfully. > > I was curious what the community thought about including this function as a > builtin function in a future release? Or would it make more sense to keep > this function as a udf in a piggybank. > > Thanks, > Sergey
+
Kai Londenberg 20130503, 18:42

Re: Function To Compute Product of Values in Bag
Thanks for the tip about numerical accuracy issues and the elegant solution exploiting log/exp. It is very much appreciated.
Sergey On Fri, May 3, 2013 at 11:42 AM, Kai Londenberg < [EMAIL PROTECTED]> wrote:
> Hi, > > Just a hint: It's usually better to work with log probabilites and sum > over them, than to work with raw probabilities and to use > multiplication. You might easily run into numerical accuracy issues > otherwise. > > i.e. exploit this fact: > > product(x1, ..., xn) = exp(sum(log(x1), ..., log(xn))) > > best, > > Kai Londenberg > > 2013/5/3 Sergey Goder <[EMAIL PROTECTED]>: > > I'm creating a multinomial naive bayes classifier using pig and need to > > compute the product of probabilities. There are an arbitrary number of > > values in the bag so I would like to be able to use a function similar to > > the builtin SUM to do this. I looked through the source code and found > that > > with some really simple changes to SUM.java I can create a PROD.java > > function. I included it in my piggybank and have been using it > successfully. > > > > I was curious what the community thought about including this function > as a > > builtin function in a future release? Or would it make more sense to keep > > this function as a udf in a piggybank. > > > > Thanks, > > Sergey >
+
Sergey Goder 20130503, 20:36

Re: Function To Compute Product of Values in Bag
As for the PRODUCT, I don't see why it could not be added to builtin. It is a very generic and dependency less function. On Fri, May 3, 2013 at 1:36 PM, Sergey Goder <[EMAIL PROTECTED]> wrote:
> Thanks for the tip about numerical accuracy issues and the elegant solution > exploiting log/exp. It is very much appreciated. > > Sergey > > > On Fri, May 3, 2013 at 11:42 AM, Kai Londenberg < > [EMAIL PROTECTED]> wrote: > > > Hi, > > > > Just a hint: It's usually better to work with log probabilites and sum > > over them, than to work with raw probabilities and to use > > multiplication. You might easily run into numerical accuracy issues > > otherwise. > > > > i.e. exploit this fact: > > > > product(x1, ..., xn) = exp(sum(log(x1), ..., log(xn))) > > > > best, > > > > Kai Londenberg > > > > 2013/5/3 Sergey Goder <[EMAIL PROTECTED]>: > > > I'm creating a multinomial naive bayes classifier using pig and need to > > > compute the product of probabilities. There are an arbitrary number of > > > values in the bag so I would like to be able to use a function similar > to > > > the builtin SUM to do this. I looked through the source code and found > > that > > > with some really simple changes to SUM.java I can create a PROD.java > > > function. I included it in my piggybank and have been using it > > successfully. > > > > > > I was curious what the community thought about including this function > > as a > > > builtin function in a future release? Or would it make more sense to > keep > > > this function as a udf in a piggybank. > > > > > > Thanks, > > > Sergey > > >
+
Julien Le Dem 20130503, 22:46

