Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> trying to count all tuples


Copy link to this message
-
trying to count all tuples
Howdy,

I'm coming from cassandra, and I'm actually trying to count all columns in a
column family.  I believe that is similar to counting the number tuples in a
bag in the lingo in the pig manual.  It was harder than I expected, but I
think this works:
rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage()
AS (key, columns: bag {T: tuple(name, value)});
counts = FOREACH rows GENERATE COUNT(columns);
counts_in_bag = GROUP counts ALL;
sum_of_bag = FOREACH counts_in_bag  GENERATE SUM($1);
dump sum_of_bag;

My question is: am I right that it works?  I started with 3 keys having a
total of 5 columns and got (5).  Then I added a new key/column, and another
column on an existing key and got (7).  So, it seems like it's working.
But, was there a better way to write it?

Thanks!

will
+
Dmitriy Ryaboy 2011-06-03, 20:06
+
William Oberman 2011-06-03, 20:09
+
William Oberman 2011-06-07, 20:33
+
William Oberman 2011-06-07, 20:58
+
William Oberman 2011-06-08, 20:56
+
Dmitriy Ryaboy 2011-06-08, 21:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB