Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using Correlation and Covariance UDFs


Copy link to this message
-
Re: Using Correlation and Covariance UDFs
Beware: you must first sort the input.

D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
sorted.$1, ... );
On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <[EMAIL PROTECTED]> wrote:

> Hi, Renato:
> For CORRELATION, I guess you can do something like
> A = load 'random.txt' using PigStorage(':') as
> (f1:double,f2:double,.........,f500:double);
> B = group A all;
> D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
>
> For COVARIANCE, I guess the UDF is COV.
>
> Johnny
>
>
> On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> [EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > Could anyone be kind enough to point me to some examples on using the
> > COVARIANCE and the CORRELATION UDFS described in here?[1]
> >
> >
> > Renato M.
> >
> >
> > [1] https://issues.apache.org/jira/browse/PIG-277
> >
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com