Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> STREAM in foreach block


Copy link to this message
-
Re: STREAM in foreach block
I believe these are the ops supported in a nested foreach:

CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY.

See:

http://pig.apache.org/docs/r0.10.0/basic.html#foreach
 On Sep 17, 2012 1:55 PM, "Kannan Shah" <[EMAIL PROTECTED]> wrote:

> I'm trying to group tuples by a key, sort by another key within each group,
> and then pass the sorted list of tuples for each group to a perl script. I
> need to use the perl script because I need to compute an aggregate quantity
> that is dependent on the sort order, and I'm not much of a Java programmer,
> so I don't know how to write a user-defined aggregate function.
>
> Doing this requires me to use STREAM in a foreach block, after the GROUP
> statement. Basically something like:
>
> r2 = group r1 by key1 ;
> r3 = foreach r2 {
>    s1=r1;
>    s2=order s1 by key2;
>    s3=stream s2 through myperlscript as (x,y,z);
>    generate group,flatten(s3.x),flatten(s3.y),flatten(s3.z);
> }
> store r3 into "r3.out" using PigStorage(';');
>
> NOTE: The FLATTENs are there only for syntactic reasons; myperlscript will
> only output one tuple for each group.
>
> I'm getting errors that make me think that you can use the STREAM operator
> within a foreach block, but I'm not sure. Can someone confirm? Is there a
> workaround to this sort of situation?
>
> Any help appreciated,
> Kannan
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB