Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Lookup in a dataset


Copy link to this message
-
Re: Lookup in a dataset
Aaron Zimmerman 2013-11-14, 12:07
You’ll want to use COGROUP.

Something like

x = COGROUP input1 by col3, input2 by col4;

needed = FILTER x by IsEmpty(input2);
Thanks,

Aaron Zimmerman
Platform Engineer
Sprout Social
773.227.7528
@apzimmerman
sproutsocial.com

On November 14, 2013 at 1:19:46 AM, Swaroop Patra ([EMAIL PROTECTED]) wrote:

Hi All,  

I need little help on scripting below condition.  

I have 2 input tab separated files. Lets consider input1 and input2.  
input1  
---------  
col1 col2 col3  
input2  
--------  
col4  

I have to fetch records from input1 where col3 value is not present in  
input2.col4  

e.g.  
input1  
----------  
11 12 13  
21 22 23  
31 32 33  
41 42 43  
Input2  
---------  
12  
23  
33  
45  
output  
---------  
11 12 13  
41 42 43  

As 33(input1.row3.col3) & 43 is not available in input2.col4.  

Thanks & Regards,  
Swaroop