Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex


Copy link to this message
-
Re: Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex
Rohini Palaniswamy 2014-02-17, 08:01

Left that on purpose. We want to try unsorted shuffle to reduce the number of stages if data is less. For eg: If there are 7K input splits and parallel set to 100, with 1-1 it will be 7K tasks in load vertex, 7K tasks in partition vertex and 100 in join vertex. We want to see if 7K in load vertex, 3.5K in partition vertex and 100 in join vertex performs better. In theory it might be better as the final join task only needs to merge 3.5K map outputs instead of 7K. But if that does not work out then we will stick with 1-1.
- Rohini
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18181/#review34628
On Feb. 17, 2014, 7:34 a.m., Rohini Palaniswamy wrote: