Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
I kept the code in myudfs.jar and my pig script is point to it using register command but the script is not able to find CalculateAvg function. I don't have any packages defined in the java file and the jar is my current directory.
On Mar 5, 2013, at 3:17 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> dividends = load 'try.txt'
> a = foreach dividends generate FLATTEN(TOBAG(*));
> b = foreach (group a all) generate CalculateAvg($1);
>
> I think that should work
>
>
> 2013/3/5 pablomar <[EMAIL PROTECTED]>
>
>> what is the error ?
>> function not found or something like that ?
>>
>> what about this ?
>> avg       = generate myudfs.CalculateAvg(dividends);
>>
>>
>> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <[EMAIL PROTECTED]
>>> wrote:
>>
>>> Hello All,
>>>
>>> I have dataset like
>>>
>>> 0, 10.1, 20.1, 30, 40,
>>>  50, 60, 70, 80.1, 1,
>>>  2, 3, 4, 5, 6,
>>>  7, 8, 9, 10, 11,
>>>  12, 13, 14, 15, 16,
>>>  1, 2, 3, 4, 5,
>>>  56, 6, 7, 8, 9,
>>>  9, 9, 9, 12, 1,
>>>  3, 14, 1, 5, 6,
>>>  7, 8, 8, 9, 12
>>>
>>> So basically comma separated values. But I want to consider this as one
>>> data column and I want to calculate the average of the whole dataset.
>>>
>>> I believe I have to write UDF to calculate average. Pig is able to load
>>> this data
>>>
>>> (  0, 10.1, 20.1, 30, 40,)
>>> (  50, 60, 70, 80.1, 1,)
>>> (  2, 3, 4, 5, 6,)
>>> (  7, 8, 9, 10, 11,)
>>> (  12, 13, 14, 15, 16,)
>>> (  1, 2, 3, 4, 5,)
>>> (  56, 6, 7, 8, 9,)
>>> (  9, 9, 9, 12, 1,)
>>> (  3, 14, 1, 5, 6,)
>>> (  7, 8, 8, 9, 12 )
>>>
>>> and How do I invoke that UDF in my pig script? Say I implement
>>> CalculateAvg function.
>>>
>>> REGISTER ./myudfs.jar
>>> dividends = load 'try.txt';
>>> dump dividends
>>> --grouped   = group dividends by symbol;
>>> avg       = generate CalculateAvg(dividends);
>>> dump avg
>>> --store avg into 'average_dividend';
>>>
>>> It fails.
>>>
>>>
>>
+
pablomar 2013-03-05, 21:09
+
Preeti Gupta 2013-03-05, 21:24
+
inelu nagamallikarjuna 2013-03-05, 22:12
+
inelu nagamallikarjuna 2013-03-05, 22:49