Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Filtering based the value of an aggregate function?


Copy link to this message
-
Filtering based the value of an aggregate function?
I would like to know if there is a better way to do the following.

GIVEN:

(name:chararray, score:float)

I would like to filter out all records that are below the average score.
This is what I came up with:

data = load 'input.dat' using PigStorage('\t') as (name:chararray,
score:float);

data_all = group data all;

avg_score = foreach data_all generate AVG(data.score) as avg_score;

data_avg = cross data, avg_score;

describe data_avg;

above_avg = filter data_avg by score > avg_score;
Is there a better or more acceptable way to make avg_score accessible
during the filter step, other than doing a cross?