Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> join with 2 skewed tables - a suggestion


Copy link to this message
-
join with 2 skewed tables - a suggestion
Hey,

We noticed that the current skewed join supports only 1 skewed table, and
assumes that the second table isn't skewed.
Please review this suggestion for a 2 skewed tables design:

   - Sample both tables
   - for each skewed key (with many records in at least one table), build a
   surrogate key in a GFCross style - e.g. if for this key there are 3M keys
   from the left table and 7M from the right table, and there are 100 reducers
   available, build GFCross with dimensions of sqrt(100*3/7) and sqrt(100*7/3)

What do you say? Is this a necessary enhancement request? Or is it safe to
assume that only one table will be skewed in each join?

Thanks, Dudu and Ido

--
Sent from my androido
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB