Pig, mail # user - join with 2 skewed tables - a suggestion

join with 2 skewed tables - a suggestion
Ido Hadanny 2013-06-17, 14:24

We noticed that the current skewed join supports only 1 skewed table, and
assumes that the second table isn't skewed.
Please review this suggestion for a 2 skewed tables design:

   - Sample both tables
   - for each skewed key (with many records in at least one table), build a
   surrogate key in a GFCross style - e.g. if for this key there are 3M keys
   from the left table and 7M from the right table, and there are 100 reducers
   available, build GFCross with dimensions of sqrt(100*3/7) and sqrt(100*7/3)

What do you say? Is this a necessary enhancement request? Or is it safe to
assume that only one table will be skewed in each join?

Thanks, Dudu and Ido

Sent from my androido