Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Count of all the rows

Copy link to this message
Re: Count of all the rows
COUNT is a UDF that takes in a Bag and outputs a Double.

Relations are not Bags, so that's one way of thinking about it. But of
course, we could have coerced the syntax to make it work.

I like to think of it as such:

A foreach is a transformation on the rows of a relation. Thus, applying
COUNT directly to a relation doesn't make any sense, since you're doing an
aggregate transformation. This is why grouping is necessary. you're putting
all of the rows of the relation into one row (with the catch-all key
"all"), so that you can run a function on them.

Don't know if that helps.

2012/8/29 Mohit Anchlia <[EMAIL PROTECTED]>

> Thanks! Why is grouping necessary? Is it to send it to the reducer?
> On Wed, Aug 29, 2012 at 4:03 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> > A = load 'foo';
> > B = group A all;
> > C = foreach B generate COUNT(A);
> >
> > Alan.
> >  On Aug 29, 2012, at 3:51 PM, Mohit Anchlia wrote:
> >
> > > How do I get count of all the rows? All the examples of COUNT use group
> > by.
> >
> >