Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Subtracting contents of two bags


+
James Newhaven 2013-01-22, 12:46
+
Bill Graham 2013-01-22, 15:53
Copy link to this message
-
Re: Subtracting contents of two bags
Bill's suggestion is good, but here is another approach that I think is
cleaner to read:

find_not_in_b = cogroup A by key OUTER, B by key;
not_in_b = foreach (filter find_not_in_b by IsEmpty(B)) generate flatten(A);
On Tue, Jan 22, 2013 at 8:53 AM, Bill Graham <[EMAIL PROTECTED]> wrote:

> You can do an left outer join of A and B and then filter by B is null.
>
> http://pig.apache.org/docs/r0.10.0/basic.html#join-outer
>
> On Tue, Jan 22, 2013 at 4:46 AM, James Newhaven <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I have two relations - A and B.  Both just contain user ids.
> >
> > I want to get a list of users who are in A but not in B.
> >
> > I am running Pig 0.9.1 and think this might be possible with the DIFF
> > function. I can see that DIFF requires one relation that contains the two
> > bags.
> >
> > How can I create a relation that contains two bags so it can be supplied
> to
> > the DIFF function?
> >
> > Any suggestions would be appreciated.
> >
> > Thanks,
> > James
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> [EMAIL PROTECTED] going forward.*
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB