Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Count of all the rows


Copy link to this message
-
Re: Count of all the rows
Mohit Anchlia 2012-08-30, 00:20
On Wed, Aug 29, 2012 at 4:51 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> COUNT is a UDF that takes in a Bag and outputs a Double.
>
> Relations are not Bags, so that's one way of thinking about it. But of
> course, we could have coerced the syntax to make it work.
>
> I like to think of it as such:
>
> A foreach is a transformation on the rows of a relation. Thus, applying
> COUNT directly to a relation doesn't make any sense, since you're doing an
> aggregate transformation. This is why grouping is necessary. you're putting
> all of the rows of the relation into one row (with the catch-all key
> "all"), so that you can run a function on them.
>

Thanks! I think I get it.
> Don't know if that helps.
>
> 2012/8/29 Mohit Anchlia <[EMAIL PROTECTED]>
>
> > Thanks! Why is grouping necessary? Is it to send it to the reducer?
> >
> > On Wed, Aug 29, 2012 at 4:03 PM, Alan Gates <[EMAIL PROTECTED]>
> wrote:
> >
> > > A = load 'foo';
> > > B = group A all;
> > > C = foreach B generate COUNT(A);
> > >
> > > Alan.
> > >  On Aug 29, 2012, at 3:51 PM, Mohit Anchlia wrote:
> > >
> > > > How do I get count of all the rows? All the examples of COUNT use
> group
> > > by.
> > >
> > >
> >
>