Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - STREAM in foreach block


Copy link to this message
-
Re: STREAM in foreach block
Dan Young 2012-09-18, 03:27
I believe these are the ops supported in a nested foreach:

CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY.

See:

http://pig.apache.org/docs/r0.10.0/basic.html#foreach
 On Sep 17, 2012 1:55 PM, "Kannan Shah" <[EMAIL PROTECTED]> wrote:

> I'm trying to group tuples by a key, sort by another key within each group,
> and then pass the sorted list of tuples for each group to a perl script. I
> need to use the perl script because I need to compute an aggregate quantity
> that is dependent on the sort order, and I'm not much of a Java programmer,
> so I don't know how to write a user-defined aggregate function.
>
> Doing this requires me to use STREAM in a foreach block, after the GROUP
> statement. Basically something like:
>
> r2 = group r1 by key1 ;
> r3 = foreach r2 {
>    s1=r1;
>    s2=order s1 by key2;
>    s3=stream s2 through myperlscript as (x,y,z);
>    generate group,flatten(s3.x),flatten(s3.y),flatten(s3.z);
> }
> store r3 into "r3.out" using PigStorage(';');
>
> NOTE: The FLATTENs are there only for syntactic reasons; myperlscript will
> only output one tuple for each group.
>
> I'm getting errors that make me think that you can use the STREAM operator
> within a foreach block, but I'm not sure. Can someone confirm? Is there a
> workaround to this sort of situation?
>
> Any help appreciated,
> Kannan
>