Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Filtering based the value of an aggregate function?


Copy link to this message
-
Filtering based the value of an aggregate function?
I would like to know if there is a better way to do the following.

GIVEN:

(name:chararray, score:float)

I would like to filter out all records that are below the average score.
This is what I came up with:

data = load 'input.dat' using PigStorage('\t') as (name:chararray,
score:float);

data_all = group data all;

avg_score = foreach data_all generate AVG(data.score) as avg_score;

data_avg = cross data, avg_score;

describe data_avg;

above_avg = filter data_avg by score > avg_score;
Is there a better or more acceptable way to make avg_score accessible
during the filter step, other than doing a cross?
+
Jonathan Packer 2013-07-29, 19:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB