-Re: Count of all the rows
Mohit Anchlia 2012-08-30, 16:57
I looked at definition of Relation which says:
A relation is a bag (more specifically, an outer bag).
If relation is a bag then what's the difference between a Bag and Relation.
I am getting bit confused on the definitions. In below example what would
be Telation, Tuple or a Bag?
Is 1,2,3,4 without "(" is a tuple? Then what is a Relation or a Bag?
On Wed, Aug 29, 2012 at 4:51 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
> COUNT is a UDF that takes in a Bag and outputs a Double.
> Relations are not Bags, so that's one way of thinking about it. But of
> course, we could have coerced the syntax to make it work.
> I like to think of it as such:
> A foreach is a transformation on the rows of a relation. Thus, applying
> COUNT directly to a relation doesn't make any sense, since you're doing an
> aggregate transformation. This is why grouping is necessary. you're putting
> all of the rows of the relation into one row (with the catch-all key
> "all"), so that you can run a function on them.
> Don't know if that helps.
> 2012/8/29 Mohit Anchlia <[EMAIL PROTECTED]>
> > Thanks! Why is grouping necessary? Is it to send it to the reducer?
> > On Wed, Aug 29, 2012 at 4:03 PM, Alan Gates <[EMAIL PROTECTED]>
> > > A = load 'foo';
> > > B = group A all;
> > > C = foreach B generate COUNT(A);
> > >
> > > Alan.
> > > On Aug 29, 2012, at 3:51 PM, Mohit Anchlia wrote:
> > >
> > > > How do I get count of all the rows? All the examples of COUNT use
> > > by.
> > >
> > >