Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Filter based on results of previous filter


Copy link to this message
-
Re: Filter based on results of previous filter
To get what you want, you need to do an (inner) join on C and D,  using
the lhs and rhs of the equality as join keys.

-Thejas

On 3/19/12 12:52 PM, Michael Moore wrote:
>
> Apologies for the formatting of the previous email.  Here's the question properly formatted:
> Hi All,
> I have a statement like this:
> -- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
> For example:  (format: dataPoint1,datapoint2)
> B would return:1,21,42,82,13,78,7
> If sampleDataPoint =2, C would return2,82,1
> I'd like D to return:1,21,48,7
> Is there a clever way to do this that I'm missing?  Thanks!-Mike
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>> Subject: Filter based on results of previous filter
>> Date: Mon, 19 Mar 2012 15:49:10 -0400
>>
>>
>> Hi All,
>> I have a statement like this:
>> -- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == 'sampleDataPoint';
>> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:D = FILTER B BY datapoint1 == C.dataPoint2;
>> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
>> For example:  (format: dataPoint1,datapoint2)
>> B would return:1,21,42,82,13,78,7
>> If sampleDataPoint =2, C would return2,82,1
>> I'd like D to return:1,21,48,7
>> Is there a clever way to do this that I'm missing?  Thanks!-Mike  
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB