Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - UDF to calculate Average of whole dataset


Copy link to this message
-
UDF to calculate Average of whole dataset
Preeti Gupta 2013-03-04, 21:56
Hello All,

I have dataset like

 0, 10.1, 20.1, 30, 40,
  50, 60, 70, 80.1, 1,
  2, 3, 4, 5, 6,
  7, 8, 9, 10, 11,
  12, 13, 14, 15, 16,
  1, 2, 3, 4, 5,
  56, 6, 7, 8, 9,
  9, 9, 9, 12, 1,
  3, 14, 1, 5, 6,
  7, 8, 8, 9, 12

So basically comma separated values. But I want to consider this as one data column and I want to calculate the average of the whole dataset.

I believe I have to write UDF to calculate average. Pig is able to load this data

(  0, 10.1, 20.1, 30, 40,)
(  50, 60, 70, 80.1, 1,)
(  2, 3, 4, 5, 6,)
(  7, 8, 9, 10, 11,)
(  12, 13, 14, 15, 16,)
(  1, 2, 3, 4, 5,)
(  56, 6, 7, 8, 9,)
(  9, 9, 9, 12, 1,)
(  3, 14, 1, 5, 6,)
(  7, 8, 8, 9, 12 )

and How do I invoke that UDF in my pig script? Say I implement CalculateAvg function.

REGISTER ./myudfs.jar
dividends = load 'try.txt';
dump dividends
--grouped   = group dividends by symbol;
avg       = generate CalculateAvg(dividends);
dump avg
--store avg into 'average_dividend';

It fails.
+
pablomar 2013-03-05, 02:26
+
Jonathan Coveney 2013-03-05, 11:17
+
Preeti Gupta 2013-03-05, 21:04
+
pablomar 2013-03-05, 21:09
+
Preeti Gupta 2013-03-05, 21:24
+
inelu nagamallikarjuna 2013-03-05, 22:12
+
inelu nagamallikarjuna 2013-03-05, 22:49