Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF to calculate Average of whole dataset


Copy link to this message
-
Re: UDF to calculate Average of whole dataset
dividends = load 'try.txt'
a = foreach dividends generate FLATTEN(TOBAG(*));
b = foreach (group a all) generate CalculateAvg($1);

I think that should work
2013/3/5 pablomar <[EMAIL PROTECTED]>

> what is the error ?
> function not found or something like that ?
>
> what about this ?
> avg       = generate myudfs.CalculateAvg(dividends);
>
>
> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <[EMAIL PROTECTED]
> >wrote:
>
> > Hello All,
> >
> > I have dataset like
> >
> >  0, 10.1, 20.1, 30, 40,
> >   50, 60, 70, 80.1, 1,
> >   2, 3, 4, 5, 6,
> >   7, 8, 9, 10, 11,
> >   12, 13, 14, 15, 16,
> >   1, 2, 3, 4, 5,
> >   56, 6, 7, 8, 9,
> >   9, 9, 9, 12, 1,
> >   3, 14, 1, 5, 6,
> >   7, 8, 8, 9, 12
> >
> > So basically comma separated values. But I want to consider this as one
> > data column and I want to calculate the average of the whole dataset.
> >
> > I believe I have to write UDF to calculate average. Pig is able to load
> > this data
> >
> > (  0, 10.1, 20.1, 30, 40,)
> > (  50, 60, 70, 80.1, 1,)
> > (  2, 3, 4, 5, 6,)
> > (  7, 8, 9, 10, 11,)
> > (  12, 13, 14, 15, 16,)
> > (  1, 2, 3, 4, 5,)
> > (  56, 6, 7, 8, 9,)
> > (  9, 9, 9, 12, 1,)
> > (  3, 14, 1, 5, 6,)
> > (  7, 8, 8, 9, 12 )
> >
> > and How do I invoke that UDF in my pig script? Say I implement
> > CalculateAvg function.
> >
> > REGISTER ./myudfs.jar
> > dividends = load 'try.txt';
> > dump dividends
> > --grouped   = group dividends by symbol;
> > avg       = generate CalculateAvg(dividends);
> > dump avg
> > --store avg into 'average_dividend';
> >
> > It fails.
> >
> >
>