Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
did you try with {jarFileName}.{FunctionName} ?
example: myudfs.CalculateAvg ?
On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:

> I kept the code in myudfs.jar and my pig script is point to it using
> register command but the script is not able to find CalculateAvg function.
> I don't have any packages defined in the java file and the jar is my
> current directory.
>
>
> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
>
> > dividends = load 'try.txt'
> > a = foreach dividends generate FLATTEN(TOBAG(*));
> > b = foreach (group a all) generate CalculateAvg($1);
> >
> > I think that should work
> >
> >
> > 2013/3/5 pablomar <[EMAIL PROTECTED]>
> >
> >> what is the error ?
> >> function not found or something like that ?
> >>
> >> what about this ?
> >> avg       = generate myudfs.CalculateAvg(dividends);
> >>
> >>
> >> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
> [EMAIL PROTECTED]
> >>> wrote:
> >>
> >>> Hello All,
> >>>
> >>> I have dataset like
> >>>
> >>> 0, 10.1, 20.1, 30, 40,
> >>>  50, 60, 70, 80.1, 1,
> >>>  2, 3, 4, 5, 6,
> >>>  7, 8, 9, 10, 11,
> >>>  12, 13, 14, 15, 16,
> >>>  1, 2, 3, 4, 5,
> >>>  56, 6, 7, 8, 9,
> >>>  9, 9, 9, 12, 1,
> >>>  3, 14, 1, 5, 6,
> >>>  7, 8, 8, 9, 12
> >>>
> >>> So basically comma separated values. But I want to consider this as one
> >>> data column and I want to calculate the average of the whole dataset.
> >>>
> >>> I believe I have to write UDF to calculate average. Pig is able to load
> >>> this data
> >>>
> >>> (  0, 10.1, 20.1, 30, 40,)
> >>> (  50, 60, 70, 80.1, 1,)
> >>> (  2, 3, 4, 5, 6,)
> >>> (  7, 8, 9, 10, 11,)
> >>> (  12, 13, 14, 15, 16,)
> >>> (  1, 2, 3, 4, 5,)
> >>> (  56, 6, 7, 8, 9,)
> >>> (  9, 9, 9, 12, 1,)
> >>> (  3, 14, 1, 5, 6,)
> >>> (  7, 8, 8, 9, 12 )
> >>>
> >>> and How do I invoke that UDF in my pig script? Say I implement
> >>> CalculateAvg function.
> >>>
> >>> REGISTER ./myudfs.jar
> >>> dividends = load 'try.txt';
> >>> dump dividends
> >>> --grouped   = group dividends by symbol;
> >>> avg       = generate CalculateAvg(dividends);
> >>> dump avg
> >>> --store avg into 'average_dividend';
> >>>
> >>> It fails.
> >>>
> >>>
> >>
>
>
+
Preeti Gupta 2013-03-05, 21:24
+
inelu nagamallikarjuna 2013-03-05, 22:12
+
inelu nagamallikarjuna 2013-03-05, 22:49