|
|
-
Re: STREAM in foreach blockDan Young 2012-09-18, 03:27
I believe these are the ops supported in a nested foreach:
CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY. See: http://pig.apache.org/docs/r0.10.0/basic.html#foreach On Sep 17, 2012 1:55 PM, "Kannan Shah" <[EMAIL PROTECTED]> wrote: > I'm trying to group tuples by a key, sort by another key within each group, > and then pass the sorted list of tuples for each group to a perl script. I > need to use the perl script because I need to compute an aggregate quantity > that is dependent on the sort order, and I'm not much of a Java programmer, > so I don't know how to write a user-defined aggregate function. > > Doing this requires me to use STREAM in a foreach block, after the GROUP > statement. Basically something like: > > r2 = group r1 by key1 ; > r3 = foreach r2 { > s1=r1; > s2=order s1 by key2; > s3=stream s2 through myperlscript as (x,y,z); > generate group,flatten(s3.x),flatten(s3.y),flatten(s3.z); > } > store r3 into "r3.out" using PigStorage(';'); > > NOTE: The FLATTENs are there only for syntactic reasons; myperlscript will > only output one tuple for each group. > > I'm getting errors that make me think that you can use the STREAM operator > within a foreach block, but I'm not sure. Can someone confirm? Is there a > workaround to this sort of situation? > > Any help appreciated, > Kannan > |