Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> nested FOREACH statements


Copy link to this message
-
Re: nested FOREACH statements
Hi!

I haven't tried this script, but here is an idea:
flattenned = FOREACH data2 GENERATE group AS initialGroup, FLATTEN(data1);
grouped = GROUP flattenned BY (initialGroup, lt, ln);
counted = FOREACH grouped GENERATE group AS wholeGroup, COUNT(flattenned)
AS aCount;
groupedAgain = GROUP counted BY wholeGroup.initialGroup
maximums = FOREACH groupedAgain GENERATE group, TOP([i don't remember the
parameters, but here goes the column to compare, the number of elements to
extract and the bag])

Also, what version of Pig are you using, I haven't tried it, but I know
that there can be 2 levels of nesting:
http://hortonworks.com/blog/new-features-in-apache-pig-0-10/
see
Nested Cross/Foreach

Hope that helps
Ruslan Al-Fakikh
On Fri, Jun 21, 2013 at 7:09 PM, Adamantios Corais <
[EMAIL PROTECTED]> wrote:

> It seems that group is not supported in nested FOREACH statements. I have
> the following schema:
>
> data2: {group: chararray,data1: {(lt: chararray,ln: chararray)}}
>
> on which I want to flatten data1, group all pairs of (lt, ln), count, order
> DESC, and finally limit 1.
>
> The idea is to extract the most probable pair of (lt, ln) for each group.
> How would you recommend me to do that?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB