Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
+
pablomar 2013-03-05, 21:09
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
Nope. It does not work

2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log
~                                                      

Pig script

REGISTER ./myudfs.jar;
dividends = load 'myfile' as (A);
dump dividends
--grouped   = filter dividends by A>-10000000.0;
--avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
avg = foreach (group dividends all) generate myudf.CalculateAvg(dividends);
dump avg

My jar file

bash-3.2# vi a.txt

     0 Mon Mar 04 13:45:44 PST 2013 META-INF/
    60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF
  1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class
  1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class
  1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class
  4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class
~                                                      

On Mar 5, 2013, at 1:09 PM, pablomar <[EMAIL PROTECTED]> wrote:

> did you try with {jarFileName}.{FunctionName} ?
> example: myudfs.CalculateAvg ?
>
>
> On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:
>
>> I kept the code in myudfs.jar and my pig script is point to it using
>> register command but the script is not able to find CalculateAvg function.
>> I don't have any packages defined in the java file and the jar is my
>> current directory.
>>
>>
>> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
>>
>>> dividends = load 'try.txt'
>>> a = foreach dividends generate FLATTEN(TOBAG(*));
>>> b = foreach (group a all) generate CalculateAvg($1);
>>>
>>> I think that should work
>>>
>>>
>>> 2013/3/5 pablomar <[EMAIL PROTECTED]>
>>>
>>>> what is the error ?
>>>> function not found or something like that ?
>>>>
>>>> what about this ?
>>>> avg       = generate myudfs.CalculateAvg(dividends);
>>>>
>>>>
>>>> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have dataset like
>>>>>
>>>>> 0, 10.1, 20.1, 30, 40,
>>>>> 50, 60, 70, 80.1, 1,
>>>>> 2, 3, 4, 5, 6,
>>>>> 7, 8, 9, 10, 11,
>>>>> 12, 13, 14, 15, 16,
>>>>> 1, 2, 3, 4, 5,
>>>>> 56, 6, 7, 8, 9,
>>>>> 9, 9, 9, 12, 1,
>>>>> 3, 14, 1, 5, 6,
>>>>> 7, 8, 8, 9, 12
>>>>>
>>>>> So basically comma separated values. But I want to consider this as one
>>>>> data column and I want to calculate the average of the whole dataset.
>>>>>
>>>>> I believe I have to write UDF to calculate average. Pig is able to load
>>>>> this data
>>>>>
>>>>> (  0, 10.1, 20.1, 30, 40,)
>>>>> (  50, 60, 70, 80.1, 1,)
>>>>> (  2, 3, 4, 5, 6,)
>>>>> (  7, 8, 9, 10, 11,)
>>>>> (  12, 13, 14, 15, 16,)
>>>>> (  1, 2, 3, 4, 5,)
>>>>> (  56, 6, 7, 8, 9,)
>>>>> (  9, 9, 9, 12, 1,)
>>>>> (  3, 14, 1, 5, 6,)
>>>>> (  7, 8, 8, 9, 12 )
>>>>>
>>>>> and How do I invoke that UDF in my pig script? Say I implement
>>>>> CalculateAvg function.
>>>>>
>>>>> REGISTER ./myudfs.jar
>>>>> dividends = load 'try.txt';
>>>>> dump dividends
>>>>> --grouped   = group dividends by symbol;
>>>>> avg       = generate CalculateAvg(dividends);
>>>>> dump avg
>>>>> --store avg into 'average_dividend';
>>>>>
>>>>> It fails.
>>>>>
>>>>>
>>>>
>>
>>
+
inelu nagamallikarjuna 2013-03-05, 22:12
+
inelu nagamallikarjuna 2013-03-05, 22:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB