Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF to calculate Average of whole dataset


Copy link to this message
-
Re: UDF to calculate Average of whole dataset
what is the error ?
function not found or something like that ?

what about this ?
avg       = generate myudfs.CalculateAvg(dividends);
On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:

> Hello All,
>
> I have dataset like
>
>  0, 10.1, 20.1, 30, 40,
>   50, 60, 70, 80.1, 1,
>   2, 3, 4, 5, 6,
>   7, 8, 9, 10, 11,
>   12, 13, 14, 15, 16,
>   1, 2, 3, 4, 5,
>   56, 6, 7, 8, 9,
>   9, 9, 9, 12, 1,
>   3, 14, 1, 5, 6,
>   7, 8, 8, 9, 12
>
> So basically comma separated values. But I want to consider this as one
> data column and I want to calculate the average of the whole dataset.
>
> I believe I have to write UDF to calculate average. Pig is able to load
> this data
>
> (  0, 10.1, 20.1, 30, 40,)
> (  50, 60, 70, 80.1, 1,)
> (  2, 3, 4, 5, 6,)
> (  7, 8, 9, 10, 11,)
> (  12, 13, 14, 15, 16,)
> (  1, 2, 3, 4, 5,)
> (  56, 6, 7, 8, 9,)
> (  9, 9, 9, 12, 1,)
> (  3, 14, 1, 5, 6,)
> (  7, 8, 8, 9, 12 )
>
> and How do I invoke that UDF in my pig script? Say I implement
> CalculateAvg function.
>
> REGISTER ./myudfs.jar
> dividends = load 'try.txt';
> dump dividends
> --grouped   = group dividends by symbol;
> avg       = generate CalculateAvg(dividends);
> dump avg
> --store avg into 'average_dividend';
>
> It fails.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB