Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Filter on contents of other dataset

Copy link to this message
Filter on contents of other dataset

What would be the best way to write this script?
I have two datasets - huge (hkey, hdata), small(skey). I want to filter
all the data from huge dataset for which F(hdata, skey) is true.
Please advise.

For example,
huge = load 'mydata' as (key:chararray, value:chararray);
small = load 'smalldata' as skey:chararray;
h_s_cross = cross huge, small;
filtered = foreach h_s_cross generate CONTAINS(value, skey);

Mridul Muralidharan 2011-04-15, 03:29
Aniket Mokashi 2011-04-15, 03:40
Mridul Muralidharan 2011-04-15, 03:44
Alan Gates 2011-04-15, 16:13