

join with 2 skewed tables  a suggestion
Hey,
We noticed that the current skewed join supports only 1 skewed table, and assumes that the second table isn't skewed. Please review this suggestion for a 2 skewed tables design:  Sample both tables  for each skewed key (with many records in at least one table), build a surrogate key in a GFCross style  e.g. if for this key there are 3M keys from the left table and 7M from the right table, and there are 100 reducers available, build GFCross with dimensions of sqrt(100*3/7) and sqrt(100*7/3) What do you say? Is this a necessary enhancement request? Or is it safe to assume that only one table will be skewed in each join? Thanks, Dudu and Ido  Sent from my androido +
Alan Gates 20130619, 20:09


