Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> STREAM in foreach block


Copy link to this message
-
STREAM in foreach block
I'm trying to group tuples by a key, sort by another key within each group,
and then pass the sorted list of tuples for each group to a perl script. I
need to use the perl script because I need to compute an aggregate quantity
that is dependent on the sort order, and I'm not much of a Java programmer,
so I don't know how to write a user-defined aggregate function.

Doing this requires me to use STREAM in a foreach block, after the GROUP
statement. Basically something like:

r2 = group r1 by key1 ;
r3 = foreach r2 {
   s1=r1;
   s2=order s1 by key2;
   s3=stream s2 through myperlscript as (x,y,z);
   generate group,flatten(s3.x),flatten(s3.y),flatten(s3.z);
}
store r3 into "r3.out" using PigStorage(';');

NOTE: The FLATTENs are there only for syntactic reasons; myperlscript will
only output one tuple for each group.

I'm getting errors that make me think that you can use the STREAM operator
within a foreach block, but I'm not sure. Can someone confirm? Is there a
workaround to this sort of situation?

Any help appreciated,
Kannan
+
Dan Young 2012-09-18, 03:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB