Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Aggregations on nested foreach statements


Copy link to this message
-
Aggregations on nested foreach statements
I have data that looks like this:

a e 11 0
b f 2 2
c g 3 3
c h 44 44
c i 75 0
d j 89 0
d k 120 0
d l 3000 0

and I load it like so:

data = load 'test.txt' using PigStorage(' ') as (cid:chararray,
iid:chararray, num1:int, num2:int);

I want to group by the first column, cid.  For each group, if any of the
num2 values (last column) are positive, I want to output every tuple in
that group with an extra field equal to num1.  If all the num2 values for
that group are zero, then I want to output every tuple in that group with
an extra field equal to 0.

I figured something like this would work:

data = load 'test.txt' using PigStorage(' ') as (cid:chararray,
iid:chararray, num1:int, num2:int);
grouped = group data by cid;
results = foreach grouped {
    result1 = SUM(data.num2);
    extended = foreach data generate *, result1 > 0 ? num1 : 0;
    generate FLATTEN(extended);
};

but it does not.  I get this error:

2013-01-22 17:15:07,647 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: <line 98, column 48>  mismatched input '>' expecting SEMI_COLON

What is the proper way to do this?  From the MapReduce perspective, I group
by the key, and in the reducer, I compute a value for each group, and then
emit every single value for that group along with some extra data.

Thanks!
Uri

--
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
[EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB