Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> trying to count all tuples


Copy link to this message
-
Re: trying to count all tuples
That is exactly what I wanted, thanks for the confirm!

On Fri, Jun 3, 2011 at 4:06 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> I am not sure what you mean by "count all columns". The code you have
> counts all *cells*.
> So:
> id1: col1, col2
> id2: col1, col2, col3
>
> has 3 columns in a conventional sense, but your code will return 5. Is
> that what you want? If so, your code seems correct.
>
> D
>
> On Fri, Jun 3, 2011 at 12:53 PM, William Oberman
> <[EMAIL PROTECTED]> wrote:
> > Howdy,
> >
> > I'm coming from cassandra, and I'm actually trying to count all columns
> in a
> > column family.  I believe that is similar to counting the number tuples
> in a
> > bag in the lingo in the pig manual.  It was harder than I expected, but I
> > think this works:
> > rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING
> CassandraStorage()
> > AS (key, columns: bag {T: tuple(name, value)});
> > counts = FOREACH rows GENERATE COUNT(columns);
> > counts_in_bag = GROUP counts ALL;
> > sum_of_bag = FOREACH counts_in_bag  GENERATE SUM($1);
> > dump sum_of_bag;
> >
> > My question is: am I right that it works?  I started with 3 keys having a
> > total of 5 columns and got (5).  Then I added a new key/column, and
> another
> > column on an existing key and got (7).  So, it seems like it's working.
> > But, was there a better way to write it?
> >
> > Thanks!
> >
> > will
> >
>

--
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB