Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
pablomar 2013-03-05, 02:26
what is the error ?
function not found or something like that ?

what about this ?
avg       = generate myudfs.CalculateAvg(dividends);
On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:

> Hello All,
>
> I have dataset like
>
>  0, 10.1, 20.1, 30, 40,
>   50, 60, 70, 80.1, 1,
>   2, 3, 4, 5, 6,
>   7, 8, 9, 10, 11,
>   12, 13, 14, 15, 16,
>   1, 2, 3, 4, 5,
>   56, 6, 7, 8, 9,
>   9, 9, 9, 12, 1,
>   3, 14, 1, 5, 6,
>   7, 8, 8, 9, 12
>
> So basically comma separated values. But I want to consider this as one
> data column and I want to calculate the average of the whole dataset.
>
> I believe I have to write UDF to calculate average. Pig is able to load
> this data
>
> (  0, 10.1, 20.1, 30, 40,)
> (  50, 60, 70, 80.1, 1,)
> (  2, 3, 4, 5, 6,)
> (  7, 8, 9, 10, 11,)
> (  12, 13, 14, 15, 16,)
> (  1, 2, 3, 4, 5,)
> (  56, 6, 7, 8, 9,)
> (  9, 9, 9, 12, 1,)
> (  3, 14, 1, 5, 6,)
> (  7, 8, 8, 9, 12 )
>
> and How do I invoke that UDF in my pig script? Say I implement
> CalculateAvg function.
>
> REGISTER ./myudfs.jar
> dividends = load 'try.txt';
> dump dividends
> --grouped   = group dividends by symbol;
> avg       = generate CalculateAvg(dividends);
> dump avg
> --store avg into 'average_dividend';
>
> It fails.
>
>
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
+
pablomar 2013-03-05, 21:09
+
Preeti Gupta 2013-03-05, 21:24
+
inelu nagamallikarjuna 2013-03-05, 22:12
+
inelu nagamallikarjuna 2013-03-05, 22:49