Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> trying to count all tuples

Copy link to this message
trying to count all tuples

I'm coming from cassandra, and I'm actually trying to count all columns in a
column family.  I believe that is similar to counting the number tuples in a
bag in the lingo in the pig manual.  It was harder than I expected, but I
think this works:
rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage()
AS (key, columns: bag {T: tuple(name, value)});
counts = FOREACH rows GENERATE COUNT(columns);
counts_in_bag = GROUP counts ALL;
sum_of_bag = FOREACH counts_in_bag  GENERATE SUM($1);
dump sum_of_bag;

My question is: am I right that it works?  I started with 3 keys having a
total of 5 columns and got (5).  Then I added a new key/column, and another
column on an existing key and got (7).  So, it seems like it's working.
But, was there a better way to write it?


Dmitriy Ryaboy 2011-06-03, 20:06
William Oberman 2011-06-03, 20:09
William Oberman 2011-06-07, 20:33
William Oberman 2011-06-07, 20:58
William Oberman 2011-06-08, 20:56
Dmitriy Ryaboy 2011-06-08, 21:31