-Re: Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex
Rohini Palaniswamy 2014-02-17, 08:01
Left that on purpose. We want to try unsorted shuffle to reduce the number of stages if data is less. For eg: If there are 7K input splits and parallel set to 100, with 1-1 it will be 7K tasks in load vertex, 7K tasks in partition vertex and 100 in join vertex. We want to see if 7K in load vertex, 3.5K in partition vertex and 100 in join vertex performs better. In theory it might be better as the final join task only needs to merge 3.5K map outputs instead of 7K. But if that does not work out then we will stick with 1-1.
This is an automatically generated e-mail. To reply, visit:
On Feb. 17, 2014, 7:34 a.m., Rohini Palaniswamy wrote: