Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using Correlation and Covariance UDFs


Copy link to this message
-
Re: Using Correlation and Covariance UDFs
Some UDFs rely on this, but it looks like I could be mistaken. This used to
be the case in piggybank I think but no longer?
On Wed, Mar 27, 2013 at 6:15 AM, Houssam <[EMAIL PROTECTED]> wrote:

> Hi Russel,
>
> I know what Johnny wrote is correct. But out of curiosity, why would you
> need to sort the input? Thanks!
>
> Houssam
>
> On Wed, Mar 27, 2013 at 2:04 AM, Russell Jurney <[EMAIL PROTECTED]
> >wrote:
>
> > Beware: you must first sort the input.
> >
> > D = foreach b { sorted = order B by $0; generate group, COR(sorted.$0,
> > sorted.$1, ... );
> >
> > ,
> > On Tue, Mar 26, 2013 at 5:11 PM, Johnny Zhang <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi, Renato:
> > > For CORRELATION, I guess you can do something like
> > > A = load 'random.txt' using PigStorage(':') as
> > > (f1:double,f2:double,.........,f500:double);
> > > B = group A all;
> > > D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
> > >
> > > For COVARIANCE, I guess the UDF is COV.
> > >
> > > Johnny
> > >
> > >
> > > On Tue, Mar 26, 2013 at 3:28 PM, Renato Marroquín Mogrovejo <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Could anyone be kind enough to point me to some examples on using the
> > > > COVARIANCE and the CORRELATION UDFS described in here?[1]
> > > >
> > > >
> > > > Renato M.
> > > >
> > > >
> > > > [1] https://issues.apache.org/jira/browse/PIG-277
> > > >
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> > datasyndrome.com
> >
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB