Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> NOT IN and EXCEPT


Copy link to this message
-
Re: NOT IN and EXCEPT

On 8/21/10 10:45 AM, "Defenestrator" <[EMAIL PROTECTED]> wrote:

> I come from the DBMS world and am not really familiar with PIG, so hopefully
> I'm asking reasonable questions.
>
> I was basically wondering if there are patterns in PIG to do the following
> set operations:
>
> 1. select * from foo where foo.a NOT IN (select x from bar);
> 2. select a, b from foo EXCEPT select x, y from bar;

1 can be implemented as left outer join with .

In sql its equivalent to - select * from foo left outer join bar on (foo.a bar.x) where bar.x is null;
 
In pig-latin
 you can do-
J = join foo by a LEFT, bar by x ;
F = filter J by x is null;

Or , use cogroup -
CG = cogroup foo by a, bar by x;
F = filter CG by SIZE(bar) == 0;

2. the difference between 'not in' and 'except' is that you do a distinct on
the columns of foo .
foo_ab = foreach foo generate a,b;
distinct_foo = distinct foo_ab;
CG = cogroup distinct_foo by (a,b), bar by (x,y);
F = filter CG by SIZE(bar) == 0;
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB