Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> SIZE() always leads to 1 reducer?


Copy link to this message
-
SIZE() always leads to 1 reducer?
I set default_parallel=15

but when I did a

y = group z ALL;
x = foreach y generate SIZE(z);

the 2 lines generate a MR job with only 1 reducer.
I guess it's because SIZE() needs to count all the groups. but don't we
have the sort of cumulative/additive UDFs ?
it would be faster if we could parallelize SIZE()

thanks
Yang
+
Mark Wagner 2013-04-12, 00:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB