Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> join with 2 skewed tables - a suggestion


Copy link to this message
-
join with 2 skewed tables - a suggestion
Hey,

We noticed that the current skewed join supports only 1 skewed table, and
assumes that the second table isn't skewed.
Please review this suggestion for a 2 skewed tables design:

   - Sample both tables
   - for each skewed key (with many records in at least one table), build a
   surrogate key in a GFCross style - e.g. if for this key there are 3M keys
   from the left table and 7M from the right table, and there are 100 reducers
   available, build GFCross with dimensions of sqrt(100*3/7) and sqrt(100*7/3)

What do you say? Is this a necessary enhancement request? Or is it safe to
assume that only one table will be skewed in each join?

Thanks, Dudu and Ido

--
Sent from my androido