Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Using Correlation and Covariance UDFs


+
Renato Marroquín Mogrovej... 2013-03-26, 22:28
+
Johnny Zhang 2013-03-27, 00:11
Copy link to this message
-
Re: Using Correlation and Covariance UDFs
Beware: you must first sort the input.

D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
sorted.$1, ... );
On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <[EMAIL PROTECTED]> wrote:

> Hi, Renato:
> For CORRELATION, I guess you can do something like
> A = load 'random.txt' using PigStorage(':') as
> (f1:double,f2:double,.........,f500:double);
> B = group A all;
> D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
>
> For COVARIANCE, I guess the UDF is COV.
>
> Johnny
>
>
> On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> [EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > Could anyone be kind enough to point me to some examples on using the
> > COVARIANCE and the CORRELATION UDFS described in here?[1]
> >
> >
> > Renato M.
> >
> >
> > [1] https://issues.apache.org/jira/browse/PIG-277
> >
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
+
Houssam 2013-03-27, 13:15
+
Russell Jurney 2013-03-27, 18:30
+
Renato Marroquín Mogrovej... 2013-03-28, 21:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB