Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Set difference in Pig

Copy link to this message
RE: Set difference in Pig
I saw this somewhere. 'Anti-join' doesn't seem very descriptive to me, but that is what it was called.
Anti-join (set difference) idiom in pig:
A = load 'input1' as (x, y);
B = load 'input2' as (u, v);
C = cogroup A by x, B by u;
D = filter C by IsEmpty(B);
E = foreach D generate flatten(A);
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
-----Original Message-----
From: Deepak Singh [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 11, 2011 9:43 PM
Subject: Set difference in Pig

   Can we do set difference in pig ?

  The set difference  is defined by:
  A-B = {x: x element of A and x is not element of B }