Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> What's the equivalent of a GROUP BY statement within a FOREACH statement?


Copy link to this message
-
Re: What's the equivalent of a GROUP BY statement within a FOREACH statement?
Adam,

Take a look at the CountEach udf in the datafu library (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/CountEach.html). Eg:
res = foreach raw3 {
        field4bag = foreach raw2 generate field4;
        field5bag = foreach raw2 generate field5;
        field4cnts = CountEach(field4bag);
        field5cnts = CountEach(field5bag);

        field4max = TOP(1, 1, field4cnts);
        field5max = TOP(1, 1, field5cnts);
        generate
          flatten(group) as (field1, field2, field3),
          flatten(field4max.tuple_schema.$0) as field4max,
          flatten(field5max.tuple_schema.$0) as field5max;
      };

Generates: (1,2,3,(a),(x)) for your input. You can do further projections to rearrange how you like downstream.

Best of luck.

@thedatachef
On Mar 20, 2014, at 5:59 AM, Adamantios Corais <[EMAIL PROTECTED]> wrote:

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB