Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - nested FOREACH statements


Copy link to this message
-
Re: nested FOREACH statements
Ruslan Al-Fakikh 2013-06-25, 10:09
Hi!

I haven't tried this script, but here is an idea:
flattenned = FOREACH data2 GENERATE group AS initialGroup, FLATTEN(data1);
grouped = GROUP flattenned BY (initialGroup, lt, ln);
counted = FOREACH grouped GENERATE group AS wholeGroup, COUNT(flattenned)
AS aCount;
groupedAgain = GROUP counted BY wholeGroup.initialGroup
maximums = FOREACH groupedAgain GENERATE group, TOP([i don't remember the
parameters, but here goes the column to compare, the number of elements to
extract and the bag])

Also, what version of Pig are you using, I haven't tried it, but I know
that there can be 2 levels of nesting:
http://hortonworks.com/blog/new-features-in-apache-pig-0-10/
see
Nested Cross/Foreach

Hope that helps
Ruslan Al-Fakikh
On Fri, Jun 21, 2013 at 7:09 PM, Adamantios Corais <
[EMAIL PROTECTED]> wrote:

> It seems that group is not supported in nested FOREACH statements. I have
> the following schema:
>
> data2: {group: chararray,data1: {(lt: chararray,ln: chararray)}}
>
> on which I want to flatten data1, group all pairs of (lt, ln), count, order
> DESC, and finally limit 1.
>
> The idea is to extract the most probable pair of (lt, ln) for each group.
> How would you recommend me to do that?
>