Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> trying to count all tuples


Copy link to this message
-
Re: trying to count all tuples
That is exactly what I wanted, thanks for the confirm!

On Fri, Jun 3, 2011 at 4:06 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> I am not sure what you mean by "count all columns". The code you have
> counts all *cells*.
> So:
> id1: col1, col2
> id2: col1, col2, col3
>
> has 3 columns in a conventional sense, but your code will return 5. Is
> that what you want? If so, your code seems correct.
>
> D
>
> On Fri, Jun 3, 2011 at 12:53 PM, William Oberman
> <[EMAIL PROTECTED]> wrote:
> > Howdy,
> >
> > I'm coming from cassandra, and I'm actually trying to count all columns
> in a
> > column family.  I believe that is similar to counting the number tuples
> in a
> > bag in the lingo in the pig manual.  It was harder than I expected, but I
> > think this works:
> > rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING
> CassandraStorage()
> > AS (key, columns: bag {T: tuple(name, value)});
> > counts = FOREACH rows GENERATE COUNT(columns);
> > counts_in_bag = GROUP counts ALL;
> > sum_of_bag = FOREACH counts_in_bag  GENERATE SUM($1);
> > dump sum_of_bag;
> >
> > My question is: am I right that it works?  I started with 3 keys having a
> > total of 5 columns and got (5).  Then I added a new key/column, and
> another
> > column on an existing key and got (7).  So, it seems like it's working.
> > But, was there a better way to write it?
> >
> > Thanks!
> >
> > will
> >
>

--
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) [EMAIL PROTECTED]