Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> avoiding Group by or filter


Copy link to this message
-
Re: avoiding Group by or filter
Why don't you want to group?
2013/3/5 Preeti Gupta <[EMAIL PROTECTED]>

> I want to compute the Average for 1 column dataset
> 1
> 2
> 3
> 4
> 5
>
> and I am not able to do without grouping.
>
> However I got an average with
>
> avg = foreach (group dividends all) generate AVG(dividends);
>
> But
>
> avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>
>  says use explicit cast.
>
> My script is very small
>
> dividends = load 'myfile.txt' as (A:double);
> dump dividends
> --grouped   = filter dividends by A>-10000000.0;
> avg       = foreach (filter dividends by A>-10000000.0) generate AVG(A);
>
>
>
> <file try.pig, line 5, column 65> Multiple matching functions for
> org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}},
> {{(double)}}). Please use an explicit cast.
>
>
> On Mar 4, 2013, at 8:30 PM, Prashant Kommireddi <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Preeti,
> >
> > Using FILTER or not depends on your requirements and has nothing to do
> with
> > SUM or AVG.
> >
> > SUM, AVG accept bags as input, so as long as you are able to provide that
> > it should be fine. (Though its very common that users use GROUP BY to
> > rollup on a key before using these UDFs).
> >
> > For example:
> >
> > grunt> cat data
> > 1    5
> > 5    8
> >
> > grunt> A = load 'data';
> > grunt> B = foreach A generate TOBAG($0, $1) as bagg;
> > grunt> dump B;
> > ({(1),(5)})
> > ({(5),(8)})
> >
> > grunt> C = foreach B generate AVG(bagg);
> > grunt> dump C;
> > (3.0)
> > (6.5)
> >
> > -Prashant
> >
> >
> > On Mon, Mar 4, 2013 at 3:50 PM, Preeti Gupta <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hello,
> >>
> >> Can I compute SUM or AVG without using GROUPBY OR FILTER?
> >>
>
>