Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> NOT IN and EXCEPT

Copy link to this message

On 8/21/10 10:45 AM, "Defenestrator" <[EMAIL PROTECTED]> wrote:

> I come from the DBMS world and am not really familiar with PIG, so hopefully
> I'm asking reasonable questions.
> I was basically wondering if there are patterns in PIG to do the following
> set operations:
> 1. select * from foo where foo.a NOT IN (select x from bar);
> 2. select a, b from foo EXCEPT select x, y from bar;

1 can be implemented as left outer join with .

In sql its equivalent to - select * from foo left outer join bar on (foo.a bar.x) where bar.x is null;
In pig-latin
 you can do-
J = join foo by a LEFT, bar by x ;
F = filter J by x is null;

Or , use cogroup -
CG = cogroup foo by a, bar by x;
F = filter CG by SIZE(bar) == 0;

2. the difference between 'not in' and 'except' is that you do a distinct on
the columns of foo .
foo_ab = foreach foo generate a,b;
distinct_foo = distinct foo_ab;
CG = cogroup distinct_foo by (a,b), bar by (x,y);
F = filter CG by SIZE(bar) == 0;