Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Function To Compute Product of Values in Bag


+
Sergey Goder 2013-05-03, 18:20
+
Kai Londenberg 2013-05-03, 18:42
+
Sergey Goder 2013-05-03, 20:36
Copy link to this message
-
Re: Function To Compute Product of Values in Bag
Julien Le Dem 2013-05-03, 22:46
As for the PRODUCT, I don't see why it could not be added to builtin.
It is a very generic and dependency less function.
On Fri, May 3, 2013 at 1:36 PM, Sergey Goder <[EMAIL PROTECTED]> wrote:

> Thanks for the tip about numerical accuracy issues and the elegant solution
> exploiting log/exp. It is very much appreciated.
>
> Sergey
>
>
> On Fri, May 3, 2013 at 11:42 AM, Kai Londenberg <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > Just a hint: It's usually better to work with log probabilites and sum
> > over them, than to work with raw probabilities and to use
> > multiplication. You might easily run into numerical accuracy issues
> > otherwise.
> >
> > i.e. exploit this fact:
> >
> > product(x1, ..., xn) = exp(sum(log(x1), ..., log(xn)))
> >
> > best,
> >
> > Kai Londenberg
> >
> > 2013/5/3 Sergey Goder <[EMAIL PROTECTED]>:
> > > I'm creating a multinomial naive bayes classifier using pig and need to
> > > compute the product of probabilities. There are an arbitrary number of
> > > values in the bag so I would like to be able to use a function similar
> to
> > > the builtin SUM to do this. I looked through the source code and found
> > that
> > > with some really simple changes to SUM.java I can create a PROD.java
> > > function. I included it in my piggybank and have been using it
> > successfully.
> > >
> > > I was curious what the community thought about including this function
> > as a
> > > builtin function in a future release? Or would it make more sense to
> keep
> > > this function as a udf in a piggybank.
> > >
> > > Thanks,
> > > Sergey
> >
>