|
|
-
avoiding Group by or filter
Preeti Gupta 2013-03-04, 23:50
Hello,
Can I compute SUM or AVG without using GROUPBY OR FILTER?
-
Re: avoiding Group by or filter
Eli Finkelshteyn 2013-03-05, 02:11
Yes. You can use any eval function such as SUM or AVG as long as your data is in the format (item1, … , item, {(tup1), …(tupn)}). See http://pig.apache.org/docs/r0.10.0/func.html#eval-functions for more info. On Mar 4, 2013, at 3:50 PM, Preeti Gupta wrote: > Hello, > > Can I compute SUM or AVG without using GROUPBY OR FILTER?
-
Re: avoiding Group by or filter
Prashant Kommireddi 2013-03-05, 04:30
Hi Preeti,
Using FILTER or not depends on your requirements and has nothing to do with SUM or AVG.
SUM, AVG accept bags as input, so as long as you are able to provide that it should be fine. (Though its very common that users use GROUP BY to rollup on a key before using these UDFs).
For example:
grunt> cat data 1 5 5 8
grunt> A = load 'data'; grunt> B = foreach A generate TOBAG($0, $1) as bagg; grunt> dump B; ({(1),(5)}) ({(5),(8)})
grunt> C = foreach B generate AVG(bagg); grunt> dump C; (3.0) (6.5)
-Prashant On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote:
> Hello, > > Can I compute SUM or AVG without using GROUPBY OR FILTER? >
-
Re: avoiding Group by or filter
Preeti Gupta 2013-03-05, 04:36
I want to compute the Average for 1 column dataset 1 2 3 4 5
and I am not able to do without grouping.
However I got an average with
avg = foreach (group dividends all) generate AVG(dividends);
But
avg = foreach (filter dividends by A>-10000000.0) generate AVG(A);
says use explicit cast.
My script is very small
dividends = load 'myfile.txt' as (A:double); dump dividends --grouped = filter dividends by A>-10000000.0; avg = foreach (filter dividends by A>-10000000.0) generate AVG(A);
<file try.pig, line 5, column 65> Multiple matching functions for org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, {{(double)}}). Please use an explicit cast. On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <[EMAIL PROTECTED]> wrote:
> Hi Preeti, > > Using FILTER or not depends on your requirements and has nothing to do with > SUM or AVG. > > SUM, AVG accept bags as input, so as long as you are able to provide that > it should be fine. (Though its very common that users use GROUP BY to > rollup on a key before using these UDFs). > > For example: > > grunt> cat data > 1 5 > 5 8 > > grunt> A = load 'data'; > grunt> B = foreach A generate TOBAG($0, $1) as bagg; > grunt> dump B; > ({(1),(5)}) > ({(5),(8)}) > > grunt> C = foreach B generate AVG(bagg); > grunt> dump C; > (3.0) > (6.5) > > -Prashant > > > On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <[EMAIL PROTECTED]>wrote: > >> Hello, >> >> Can I compute SUM or AVG without using GROUPBY OR FILTER? >>
-
Re: avoiding Group by or filter
Jonathan Coveney 2013-03-05, 11:14
Why don't you want to group? 2013/3/5 Preeti Gupta <[EMAIL PROTECTED]>
> I want to compute the Average for 1 column dataset > 1 > 2 > 3 > 4 > 5 > > and I am not able to do without grouping. > > However I got an average with > > avg = foreach (group dividends all) generate AVG(dividends); > > But > > avg = foreach (filter dividends by A>-10000000.0) generate AVG(A); > > says use explicit cast. > > My script is very small > > dividends = load 'myfile.txt' as (A:double); > dump dividends > --grouped = filter dividends by A>-10000000.0; > avg = foreach (filter dividends by A>-10000000.0) generate AVG(A); > > > > <file try.pig, line 5, column 65> Multiple matching functions for > org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, > {{(double)}}). Please use an explicit cast. > > > On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <[EMAIL PROTECTED]> > wrote: > > > Hi Preeti, > > > > Using FILTER or not depends on your requirements and has nothing to do > with > > SUM or AVG. > > > > SUM, AVG accept bags as input, so as long as you are able to provide that > > it should be fine. (Though its very common that users use GROUP BY to > > rollup on a key before using these UDFs). > > > > For example: > > > > grunt> cat data > > 1 5 > > 5 8 > > > > grunt> A = load 'data'; > > grunt> B = foreach A generate TOBAG($0, $1) as bagg; > > grunt> dump B; > > ({(1),(5)}) > > ({(5),(8)}) > > > > grunt> C = foreach B generate AVG(bagg); > > grunt> dump C; > > (3.0) > > (6.5) > > > > -Prashant > > > > > > On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <[EMAIL PROTECTED] > >wrote: > > > >> Hello, > >> > >> Can I compute SUM or AVG without using GROUPBY OR FILTER? > >> > >
-
Re: avoiding Group by or filter
Preeti Gupta 2013-03-05, 15:10
because there is nothing to group On Mar 5, 2013, at 3:14 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> Why don't you want to group? > > > 2013/3/5 Preeti Gupta <[EMAIL PROTECTED]> > >> I want to compute the Average for 1 column dataset >> 1 >> 2 >> 3 >> 4 >> 5 >> >> and I am not able to do without grouping. >> >> However I got an average with >> >> avg = foreach (group dividends all) generate AVG(dividends); >> >> But >> >> avg = foreach (filter dividends by A>-10000000.0) generate AVG(A); >> >> says use explicit cast. >> >> My script is very small >> >> dividends = load 'myfile.txt' as (A:double); >> dump dividends >> --grouped = filter dividends by A>-10000000.0; >> avg = foreach (filter dividends by A>-10000000.0) generate AVG(A); >> >> >> >> <file try.pig, line 5, column 65> Multiple matching functions for >> org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, >> {{(double)}}). Please use an explicit cast. >> >> >> On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <[EMAIL PROTECTED]> >> wrote: >> >>> Hi Preeti, >>> >>> Using FILTER or not depends on your requirements and has nothing to do >> with >>> SUM or AVG. >>> >>> SUM, AVG accept bags as input, so as long as you are able to provide that >>> it should be fine. (Though its very common that users use GROUP BY to >>> rollup on a key before using these UDFs). >>> >>> For example: >>> >>> grunt> cat data >>> 1 5 >>> 5 8 >>> >>> grunt> A = load 'data'; >>> grunt> B = foreach A generate TOBAG($0, $1) as bagg; >>> grunt> dump B; >>> ({(1),(5)}) >>> ({(5),(8)}) >>> >>> grunt> C = foreach B generate AVG(bagg); >>> grunt> dump C; >>> (3.0) >>> (6.5) >>> >>> -Prashant >>> >>> >>> On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <[EMAIL PROTECTED] >>> wrote: >>> >>>> Hello, >>>> >>>> Can I compute SUM or AVG without using GROUPBY OR FILTER? >>>> >> >>
-
Re: avoiding Group by or filter
Jonathan Coveney 2013-03-05, 22:06
There have been a number of explanations on the topic before, so I would prefer to point at one of them (or ensure we document it better), but basically all of the aggregation functions we use (sum, avg, etc) all function on bags of stuff. This is actually true in SQL as well (it just hides the "group all", but it is implied). In this case, you are grouping all of the rows together in order to run the function on them, since you cannot run a function on a relation, only on a bag. Does that make any sense? I know this is sort of an annoying nuance to understand in Pig... 2013/3/5 Eli Finkelshteyn <[EMAIL PROTECTED]> > Yes. You can use any eval function such as SUM or AVG as long as your data > is in the format (item1, … , item, {(tup1), …(tupn)}). See > http://pig.apache.org/docs/r0.10.0/func.html#eval-functions for more info. > > On Mar 4, 2013, at 3:50 PM, Preeti Gupta wrote: > > > Hello, > > > > Can I compute SUM or AVG without using GROUPBY OR FILTER? > >
|
|