Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Lookup in a dataset


Copy link to this message
-
Re: Lookup in a dataset
You’ll want to use COGROUP.

Something like

x = COGROUP input1 by col3, input2 by col4;

needed = FILTER x by IsEmpty(input2);
Thanks,

Aaron Zimmerman
Platform Engineer
Sprout Social
773.227.7528
@apzimmerman
sproutsocial.com

On November 14, 2013 at 1:19:46 AM, Swaroop Patra ([EMAIL PROTECTED]) wrote:

Hi All,  

I need little help on scripting below condition.  

I have 2 input tab separated files. Lets consider input1 and input2.  
input1  
---------  
col1 col2 col3  
input2  
--------  
col4  

I have to fetch records from input1 where col3 value is not present in  
input2.col4  

e.g.  
input1  
----------  
11 12 13  
21 22 23  
31 32 33  
41 42 43  
Input2  
---------  
12  
23  
33  
45  
output  
---------  
11 12 13  
41 42 43  

As 33(input1.row3.col3) & 43 is not available in input2.col4.  

Thanks & Regards,  
Swaroop  
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB