Pig, mail # user - Re: What's the equivalent of a GROUP BY statement within a FOREACH statement? - 2014-03-20, 14:27
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: What's the equivalent of a GROUP BY statement within a FOREACH statement?
Adam,

Take a look at the CountEach udf in the datafu library (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/CountEach.html). Eg:
res = foreach raw3 {
        field4bag = foreach raw2 generate field4;
        field5bag = foreach raw2 generate field5;
        field4cnts = CountEach(field4bag);
        field5cnts = CountEach(field5bag);

        field4max = TOP(1, 1, field4cnts);
        field5max = TOP(1, 1, field5cnts);
        generate
          flatten(group) as (field1, field2, field3),
          flatten(field4max.tuple_schema.$0) as field4max,
          flatten(field5max.tuple_schema.$0) as field5max;
      };

Generates: (1,2,3,(a),(x)) for your input. You can do further projections to rearrange how you like downstream.

Best of luck.

@thedatachef
On Mar 20, 2014, at 5:59 AM, Adamantios Corais <[EMAIL PROTECTED]> wrote:

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB