Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Lookup in a dataset


Copy link to this message
-
Re: Lookup in a dataset
Thanks Aaron for replay.

I will try this out.

Thanks,
Swaroop

On 14-Nov-2013, at 5:37 pm, Aaron Zimmerman <[EMAIL PROTECTED]> wrote:

> You’ll want to use COGROUP.
>
> Something like
>
> x = COGROUP input1 by col3, input2 by col4;
>
> needed = FILTER x by IsEmpty(input2);
>
>
> Thanks,
>
> Aaron Zimmerman
> Platform Engineer
> Sprout Social
> 773.227.7528
> @apzimmerman
> sproutsocial.com
>
> On November 14, 2013 at 1:19:46 AM, Swaroop Patra ([EMAIL PROTECTED]) wrote:
>
>> Hi All,
>>
>> I need little help on scripting below condition.
>>
>> I have 2 input tab separated files. Lets consider input1 and input2.
>> input1
>> ---------
>> col1 col2 col3
>> input2
>> --------
>> col4
>>
>> I have to fetch records from input1 where col3 value is not present in
>> input2.col4
>>
>> e.g.
>> input1
>> ----------
>> 11 12 13
>> 21 22 23
>> 31 32 33
>> 41 42 43
>> Input2
>> ---------
>> 12
>> 23
>> 33
>> 45
>>
>>
>> output
>> ---------
>> 11 12 13
>> 41 42 43
>>
>> As 33(input1.row3.col3) & 43 is not available in input2.col4.
>>
>> Thanks & Regards,
>> Swaroop

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB