Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - SIZE() always leads to 1 reducer?


Copy link to this message
-
SIZE() always leads to 1 reducer?
Yang 2013-04-11, 22:13
I set default_parallel=15

but when I did a

y = group z ALL;
x = foreach y generate SIZE(z);

the 2 lines generate a MR job with only 1 reducer.
I guess it's because SIZE() needs to count all the groups. but don't we
have the sort of cumulative/additive UDFs ?
it would be faster if we could parallelize SIZE()

thanks
Yang