Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Once more...


Copy link to this message
-
Re: Once more...
Jacob Perkins 2012-03-19, 20:03
Michael,

Why not just:

D = foreach (join C by datapoint2, B by datapoint1) generate
      B::datapoint1, B::datapoint2;

Does that get you what you need?

--jacob
@thedatachef

On Mon, 2012-03-19 at 15:55 -0400, Michael Moore wrote:
> Really sorry folks.  Hotmail stinks.  In case this doesn't come through, I put it in a PasteBin: http://pastebin.com/fKxRvCpQ
>
> -Michael
>
> ---
>
> Hi All,
> I have a statement like this:
> -- A is omitted, loads data
> B = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;
> C = FILTER B BY dataPoint1 == 'sampleDataPoint';
>
> I'd like to generate a new filter based on the results of C.  For instance, I'd like to do something like this:
> D = FILTER B BY datapoint1 == C.dataPoint2;
>
> (This would look for all rows in B where dataPoint1 is the same as the matching dataPoint2 to 'sampleDataPoint'.)
>
> For example:  (format: dataPoint1,datapoint2)
>
>
> B would return:
> 1,2
> 1,4
> 2,8
> 2,1
> 3,7
> 8,7
>
> If sampleDataPoint =2, C would return
> 2,8
> 2,1
>
> I'd like D to return:
> 1,2
> 1,4
> 8,7
>
> Is there a clever way to do this that I'm missing?  Thanks!
> -Mike