Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF to calculate Average of whole dataset


+
Preeti Gupta 2013-03-04, 21:56
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
+
pablomar 2013-03-05, 21:09
+
Preeti Gupta 2013-03-05, 21:24
+
inelu nagamallikarjuna 2013-03-05, 22:12
Copy link to this message
-
Re: UDF to calculate Average of whole dataset
Hi,

I am providing sample UDF and how to use it in pig script.

*JAVA CLASS:

package myudf.udf.upper;

public class UPPER extends EvalFunc<String>
{
        logic to convert all the tokens into Upper case ones.
}*

*input data:*
naga
siva
ravi

*Pig Script*

*-- Always use absolute path of the udf jar location
register /home/naga/bigdata/pig-0.10.0/upper.jar
data = load '/data/names/' using PigStorage() as (name: chararray);
names = foreach data generate **myudf.udf.upper.UPPER(name);
dump names;

output:*

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2013-03-06 04:08:14,017 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2013-03-06 04:08:14,018 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
*(NAGA)
(SIVA)
(RAVI)*
Thanks
Nagamallikarjuna
On Wed, Mar 6, 2013 at 3:42 AM, inelu nagamallikarjuna
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> Use the fully qualified class name like org.apache.udf.myudf.udfName in
> the pig script while using udf.
> Otherwise use only udf name in the script and while running use like pig -
> Dudf.import.list=org.apache.udf.myudf.evaluation.string scriptname.pig
>
>
> Thanks
> Nagamallikarjuna
>
>
> On Wed, Mar 6, 2013 at 2:54 AM, Preeti Gupta <[EMAIL PROTECTED]>wrote:
>
>> Nope. It does not work
>>
>> 2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [,
>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>> Details at logfile:
>> /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log
>> ~
>>
>> Pig script
>>
>> REGISTER ./myudfs.jar;
>> dividends = load 'myfile' as (A);
>> dump dividends
>> --grouped   = filter dividends by A>-10000000.0;
>> --avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>> avg = foreach (group dividends all) generate
>> myudf.CalculateAvg(dividends);
>> dump avg
>>
>> My jar file
>>
>> bash-3.2# vi a.txt
>>
>>      0 Mon Mar 04 13:45:44 PST 2013 META-INF/
>>     60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF
>>   1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class
>>   1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class
>>   1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class
>>   4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class
>> ~
>>
>> On Mar 5, 2013, at 1:09 PM, pablomar <[EMAIL PROTECTED]>
>> wrote:
>>
>> > did you try with {jarFileName}.{FunctionName} ?
>> > example: myudfs.CalculateAvg ?
>> >
>> >
>> > On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> I kept the code in myudfs.jar and my pig script is point to it using
>> >> register command but the script is not able to find CalculateAvg
>> function.
>> >> I don't have any packages defined in the java file and the jar is my
>> >> current directory.
>> >>
>> >>
>> >> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >>> dividends = load 'try.txt'
>> >>> a = foreach dividends generate FLATTEN(TOBAG(*));
>> >>> b = foreach (group a all) generate CalculateAvg($1);
>> >>>
>> >>> I think that should work
>> >>>
>> >>>
>> >>> 2013/3/5 pablomar <[EMAIL PROTECTED]>
>> >>>
>> >>>> what is the error ?
>> >>>> function not found or something like that ?
>> >>>>
>> >>>> what about this ?
>> >>>> avg       = generate myudfs.CalculateAvg(dividends);
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
>> >> [EMAIL PROTECTED]
>> >>>>> wrote:
>> >>>>
>> >>>>> Hello All,
>> >>>>>
>> >>>>> I have dataset like
>> >>>>>
>> >>>>> 0, 10.1, 20.1, 30, 40,
>> >>>>> 50, 60, 70, 80.1, 1,
>> >>>>> 2, 3, 4, 5, 6,
>> >>>>> 7, 8, 9, 10, 11,
>> >>>>> 12, 13, 14, 15, 16,
>> >>>>> 1, 2, 3, 4, 5,
>> >>>>> 56, 6, 7, 8, 9,
>> >>>>> 9, 9, 9, 12, 1,
>> >>>>> 3, 14, 1, 5, 6,
>> >>>>> 7, 8, 8, 9, 12
>
Thanks and Regards
Nagamallikarjuna
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB