Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> COUNT(A.field1)


Copy link to this message
-
COUNT(A.field1)
Wondering about performance and count...
A =  load 'test.csv' as (a1,a2,a3);
B = GROUP A by a1;
-- which preferred?
C = FOREACH B GENERATE COUNT(A);
-- or would this only send a single field through the COUNT and be more performant?
C = FOREACH B GENERATE COUNT(A.a2);