Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
+
pablomar 2013-03-05, 21:09
+
Preeti Gupta 2013-03-05, 21:24
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
Hi,

Use the fully qualified class name like org.apache.udf.myudf.udfName in the
pig script while using udf.
Otherwise use only udf name in the script and while running use like pig -
Dudf.import.list=org.apache.udf.myudf.evaluation.string scriptname.pig
Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 2:54 AM, Preeti Gupta <[EMAIL PROTECTED]>wrote:

> Nope. It does not work
>
> 2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> Details at logfile:
> /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log
> ~
>
> Pig script
>
> REGISTER ./myudfs.jar;
> dividends = load 'myfile' as (A);
> dump dividends
> --grouped   = filter dividends by A>-10000000.0;
> --avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
> avg = foreach (group dividends all) generate myudf.CalculateAvg(dividends);
> dump avg
>
> My jar file
>
> bash-3.2# vi a.txt
>
>      0 Mon Mar 04 13:45:44 PST 2013 META-INF/
>     60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF
>   1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class
>   1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class
>   1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class
>   4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class
> ~
>
> On Mar 5, 2013, at 1:09 PM, pablomar <[EMAIL PROTECTED]>
> wrote:
>
> > did you try with {jarFileName}.{FunctionName} ?
> > example: myudfs.CalculateAvg ?
> >
> >
> > On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta <[EMAIL PROTECTED]
> >wrote:
> >
> >> I kept the code in myudfs.jar and my pig script is point to it using
> >> register command but the script is not able to find CalculateAvg
> function.
> >> I don't have any packages defined in the java file and the jar is my
> >> current directory.
> >>
> >>
> >> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> dividends = load 'try.txt'
> >>> a = foreach dividends generate FLATTEN(TOBAG(*));
> >>> b = foreach (group a all) generate CalculateAvg($1);
> >>>
> >>> I think that should work
> >>>
> >>>
> >>> 2013/3/5 pablomar <[EMAIL PROTECTED]>
> >>>
> >>>> what is the error ?
> >>>> function not found or something like that ?
> >>>>
> >>>> what about this ?
> >>>> avg       = generate myudfs.CalculateAvg(dividends);
> >>>>
> >>>>
> >>>> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
> >> [EMAIL PROTECTED]
> >>>>> wrote:
> >>>>
> >>>>> Hello All,
> >>>>>
> >>>>> I have dataset like
> >>>>>
> >>>>> 0, 10.1, 20.1, 30, 40,
> >>>>> 50, 60, 70, 80.1, 1,
> >>>>> 2, 3, 4, 5, 6,
> >>>>> 7, 8, 9, 10, 11,
> >>>>> 12, 13, 14, 15, 16,
> >>>>> 1, 2, 3, 4, 5,
> >>>>> 56, 6, 7, 8, 9,
> >>>>> 9, 9, 9, 12, 1,
> >>>>> 3, 14, 1, 5, 6,
> >>>>> 7, 8, 8, 9, 12
> >>>>>
> >>>>> So basically comma separated values. But I want to consider this as
> one
> >>>>> data column and I want to calculate the average of the whole dataset.
> >>>>>
> >>>>> I believe I have to write UDF to calculate average. Pig is able to
> load
> >>>>> this data
> >>>>>
> >>>>> (  0, 10.1, 20.1, 30, 40,)
> >>>>> (  50, 60, 70, 80.1, 1,)
> >>>>> (  2, 3, 4, 5, 6,)
> >>>>> (  7, 8, 9, 10, 11,)
> >>>>> (  12, 13, 14, 15, 16,)
> >>>>> (  1, 2, 3, 4, 5,)
> >>>>> (  56, 6, 7, 8, 9,)
> >>>>> (  9, 9, 9, 12, 1,)
> >>>>> (  3, 14, 1, 5, 6,)
> >>>>> (  7, 8, 8, 9, 12 )
> >>>>>
> >>>>> and How do I invoke that UDF in my pig script? Say I implement
> >>>>> CalculateAvg function.
> >>>>>
> >>>>> REGISTER ./myudfs.jar
> >>>>> dividends = load 'try.txt';
> >>>>> dump dividends
> >>>>> --grouped   = group dividends by symbol;
> >>>>> avg       = generate CalculateAvg(dividends);
> >>>>> dump avg
> >>>>> --store avg into 'average_dividend';
> >>>>>
> >>>>> It fails.
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>
--
Thanks and Regards
Nagamallikarjuna
+
inelu nagamallikarjuna 2013-03-05, 22:49