Pig, mail # user - GROUP ALL Partitioning - 2014-01-23, 19:38
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
GROUP ALL Partitioning
Hi there,

Just curious, can anyone provide a quick explanation or link to the source
code of how Pig partitions data on a GROUP alias ALL operation?  We're
seeing some odd behaviour, likely caused by skew in our data, and was just
curious how Pig will partition groups to reducers if there's no group key.

We've gotten around this already by providing our own partition key to
reduce skew.


Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB