Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Count of all the rows


Copy link to this message
-
Re: Count of all the rows
I looked at definition of Relation which says:
A relation is a bag (more specifically, an outer bag).
If relation is a bag then what's the difference between a Bag and Relation.
I am getting bit confused on the definitions. In below example what would
be Telation, Tuple or a Bag?

(1,2,3,4)

Is 1,2,3,4 without "(" is a tuple? Then what is a Relation or a Bag?

On Wed, Aug 29, 2012 at 4:51 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> COUNT is a UDF that takes in a Bag and outputs a Double.
>
> Relations are not Bags, so that's one way of thinking about it. But of
> course, we could have coerced the syntax to make it work.
>
> I like to think of it as such:
>
> A foreach is a transformation on the rows of a relation. Thus, applying
> COUNT directly to a relation doesn't make any sense, since you're doing an
> aggregate transformation. This is why grouping is necessary. you're putting
> all of the rows of the relation into one row (with the catch-all key
> "all"), so that you can run a function on them.
>
> Don't know if that helps.
>
> 2012/8/29 Mohit Anchlia <[EMAIL PROTECTED]>
>
> > Thanks! Why is grouping necessary? Is it to send it to the reducer?
> >
> > On Wed, Aug 29, 2012 at 4:03 PM, Alan Gates <[EMAIL PROTECTED]>
> wrote:
> >
> > > A = load 'foo';
> > > B = group A all;
> > > C = foreach B generate COUNT(A);
> > >
> > > Alan.
> > >  On Aug 29, 2012, at 3:51 PM, Mohit Anchlia wrote:
> > >
> > > > How do I get count of all the rows? All the examples of COUNT use
> group
> > > by.
> > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB