Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Left joins with != condition

Copy link to this message
Re: Left joins with != condition
If I understand correctly, this is nothing more than an anti-join which
can be done with pig using a cogroup.

So your SQL below:

> select * from yee a left join yer b on a.loc != b.loc;

becomes something like:

a = load 'yee' as (loc:chararray, stuff:int);
b = load 'yer' as (loc:chararray, stuff:int);

c = cogroup a by loc, b by loc;
d = foreach (filter c by IsEmpty(b)) generate FLATTEN(a);

which will result in d containing only the records from a where the
'loc' field doesn't match with the 'loc' field in b.