Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Once more...


Copy link to this message
-
Re: Once more...
Michael,

Why not just:

D = foreach (join C by datapoint2, B by datapoint1) generate
      B::datapoint1, B::datapoint2;

Does that get you what you need?

--jacob
@thedatachef

On Mon, 2012-03-19 at 15:55 -0400, Michael Moore wrote:
> Really sorry folks.  Hotmail stinks.  In case this doesn't come through, I put it in a PasteBin: http://pastebin.com/fKxRvCpQ
>
> -Michael
>
> ---
>
> Hi All,
> I have a statement like this:
> -- A is omitted, loads data
> B = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;
> C = FILTER B BY dataPoint1 == 'sampleDataPoint';
>
> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:
> D = FILTER B BY datapoint1 == C.dataPoint2;
>
> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
>
> For example:  (format: dataPoint1,datapoint2)
>
>
> B would return:
> 1,2
> 1,4
> 2,8
> 2,1
> 3,7
> 8,7
>
> If sampleDataPoint =2, C would return
> 2,8
> 2,1
>
> I'd like D to return:
> 1,2
> 1,4
> 8,7
>
> Is there a clever way to do this that I'm missing?  Thanks!
> -Mike
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB