Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Filter on contents of other dataset


Copy link to this message
-
Filter on contents of other dataset
Hi,

What would be the best way to write this script?
I have two datasets - huge (hkey, hdata), small(skey). I want to filter
all the data from huge dataset for which F(hdata, skey) is true.
Please advise.

For example,
huge = load 'mydata' as (key:chararray, value:chararray);
small = load 'smalldata' as skey:chararray;
h_s_cross = cross huge, small;
filtered = foreach h_s_cross generate CONTAINS(value, skey);

Thanks,
Aniket
+
Mridul Muralidharan 2011-04-15, 03:29
+
Aniket Mokashi 2011-04-15, 03:40
+
Mridul Muralidharan 2011-04-15, 03:44
+
Alan Gates 2011-04-15, 16:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB