Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Filtering based the value of an aggregate function?


+
Tim Chan 2013-07-29, 19:49
Copy link to this message
-
Re: Filtering based the value of an aggregate function?
Jonathan Packer 2013-07-29, 19:54
You can cast the single-tuple relation "avg_score" to a scalar using the
following syntax:

above_avg = filter data by score > avg_score.avg_score;
On Mon, Jul 29, 2013 at 3:49 PM, Tim Chan <[EMAIL PROTECTED]> wrote:

> I would like to know if there is a better way to do the following.
>
> GIVEN:
>
> (name:chararray, score:float)
>
> I would like to filter out all records that are below the average score.
>
>
> This is what I came up with:
>
> data = load 'input.dat' using PigStorage('\t') as (name:chararray,
> score:float);
>
> data_all = group data all;
>
> avg_score = foreach data_all generate AVG(data.score) as avg_score;
>
> data_avg = cross data, avg_score;
>
> describe data_avg;
>
> above_avg = filter data_avg by score > avg_score;
>
>
> Is there a better or more acceptable way to make avg_score accessible
> during the filter step, other than doing a cross?
>