Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - What's the equivalent of a GROUP BY statement within a FOREACH statement?


Copy link to this message
-
Re: What's the equivalent of a GROUP BY statement within a FOREACH statement?
Jacob Perkins 2014-03-20, 14:27
Adam,

Take a look at the CountEach udf in the datafu library (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/CountEach.html). Eg:
res = foreach raw3 {
        field4bag = foreach raw2 generate field4;
        field5bag = foreach raw2 generate field5;
        field4cnts = CountEach(field4bag);
        field5cnts = CountEach(field5bag);

        field4max = TOP(1, 1, field4cnts);
        field5max = TOP(1, 1, field5cnts);
        generate
          flatten(group) as (field1, field2, field3),
          flatten(field4max.tuple_schema.$0) as field4max,
          flatten(field5max.tuple_schema.$0) as field5max;
      };

Generates: (1,2,3,(a),(x)) for your input. You can do further projections to rearrange how you like downstream.

Best of luck.

@thedatachef
On Mar 20, 2014, at 5:59 AM, Adamantios Corais <[EMAIL PROTECTED]> wrote: