Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - avoiding Group by or filter


+
Preeti Gupta 2013-03-04, 23:50
+
Prashant Kommireddi 2013-03-05, 04:30
Copy link to this message
-
Re: avoiding Group by or filter
Preeti Gupta 2013-03-05, 04:36
I want to compute the Average for 1 column dataset
1
2
3
4
5

and I am not able to do without grouping.

However I got an average with

avg = foreach (group dividends all) generate AVG(dividends);

But

avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);

 says use explicit cast.

My script is very small

dividends = load 'myfile.txt' as (A:double);
dump dividends
--grouped   = filter dividends by A>-10000000.0;
avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);

<file try.pig, line 5, column 65> Multiple matching functions for org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, {{(double)}}). Please use an explicit cast.
On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <[EMAIL PROTECTED]> wrote:

> Hi Preeti,
>
> Using FILTER or not depends on your requirements and has nothing to do with
> SUM or AVG.
>
> SUM, AVG accept bags as input, so as long as you are able to provide that
> it should be fine. (Though its very common that users use GROUP BY to
> rollup on a key before using these UDFs).
>
> For example:
>
> grunt> cat data
> 1    5
> 5    8
>
> grunt> A = load 'data';
> grunt> B = foreach A generate TOBAG($0, $1) as bagg;
> grunt> dump B;
> ({(1),(5)})
> ({(5),(8)})
>
> grunt> C = foreach B generate AVG(bagg);
> grunt> dump C;
> (3.0)
> (6.5)
>
> -Prashant
>
>
> On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> Can I compute SUM or AVG without using GROUPBY OR FILTER?
>>

+
Jonathan Coveney 2013-03-05, 11:14
+
Preeti Gupta 2013-03-05, 15:10
+
Eli Finkelshteyn 2013-03-05, 02:11
+
Jonathan Coveney 2013-03-05, 22:06