Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
+
pablomar 2013-03-05, 21:09
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
Nope. It does not work

2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log
~                                                      

Pig script

REGISTER ./myudfs.jar;
dividends = load 'myfile' as (A);
dump dividends
--grouped   = filter dividends by A>-10000000.0;
--avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
avg = foreach (group dividends all) generate myudf.CalculateAvg(dividends);
dump avg

My jar file

bash-3.2# vi a.txt

     0 Mon Mar 04 13:45:44 PST 2013 META-INF/
    60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF
  1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class
  1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class
  1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class
  4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class
~                                                      

On Mar 5, 2013, at 1:09 PM, pablomar <[EMAIL PROTECTED]> wrote:

> did you try with {jarFileName}.{FunctionName} ?
> example: myudfs.CalculateAvg ?
>
>
> On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:
>
>> I kept the code in myudfs.jar and my pig script is point to it using
>> register command but the script is not able to find CalculateAvg function.
>> I don't have any packages defined in the java file and the jar is my
>> current directory.
>>
>>
>> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
>>
>>> dividends = load 'try.txt'
>>> a = foreach dividends generate FLATTEN(TOBAG(*));
>>> b = foreach (group a all) generate CalculateAvg($1);
>>>
>>> I think that should work
>>>
>>>
>>> 2013/3/5 pablomar <[EMAIL PROTECTED]>
>>>
>>>> what is the error ?
>>>> function not found or something like that ?
>>>>
>>>> what about this ?
>>>> avg       = generate myudfs.CalculateAvg(dividends);
>>>>
>>>>
>>>> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have dataset like
>>>>>
>>>>> 0, 10.1, 20.1, 30, 40,
>>>>> 50, 60, 70, 80.1, 1,
>>>>> 2, 3, 4, 5, 6,
>>>>> 7, 8, 9, 10, 11,
>>>>> 12, 13, 14, 15, 16,
>>>>> 1, 2, 3, 4, 5,
>>>>> 56, 6, 7, 8, 9,
>>>>> 9, 9, 9, 12, 1,
>>>>> 3, 14, 1, 5, 6,
>>>>> 7, 8, 8, 9, 12
>>>>>
>>>>> So basically comma separated values. But I want to consider this as one
>>>>> data column and I want to calculate the average of the whole dataset.
>>>>>
>>>>> I believe I have to write UDF to calculate average. Pig is able to load
>>>>> this data
>>>>>
>>>>> (  0, 10.1, 20.1, 30, 40,)
>>>>> (  50, 60, 70, 80.1, 1,)
>>>>> (  2, 3, 4, 5, 6,)
>>>>> (  7, 8, 9, 10, 11,)
>>>>> (  12, 13, 14, 15, 16,)
>>>>> (  1, 2, 3, 4, 5,)
>>>>> (  56, 6, 7, 8, 9,)
>>>>> (  9, 9, 9, 12, 1,)
>>>>> (  3, 14, 1, 5, 6,)
>>>>> (  7, 8, 8, 9, 12 )
>>>>>
>>>>> and How do I invoke that UDF in my pig script? Say I implement
>>>>> CalculateAvg function.
>>>>>
>>>>> REGISTER ./myudfs.jar
>>>>> dividends = load 'try.txt';
>>>>> dump dividends
>>>>> --grouped   = group dividends by symbol;
>>>>> avg       = generate CalculateAvg(dividends);
>>>>> dump avg
>>>>> --store avg into 'average_dividend';
>>>>>
>>>>> It fails.
>>>>>
>>>>>
>>>>
>>
>>
+
inelu nagamallikarjuna 2013-03-05, 22:12
+
inelu nagamallikarjuna 2013-03-05, 22:49