Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Filtering based the value of an aggregate function?


Copy link to this message
-
Re: Filtering based the value of an aggregate function?
You can cast the single-tuple relation "avg_score" to a scalar using the
following syntax:

above_avg = filter data by score > avg_score.avg_score;
On Mon, Jul 29, 2013 at 3:49 PM, Tim Chan <[EMAIL PROTECTED]> wrote:

> I would like to know if there is a better way to do the following.
>
> GIVEN:
>
> (name:chararray, score:float)
>
> I would like to filter out all records that are below the average score.
>
>
> This is what I came up with:
>
> data = load 'input.dat' using PigStorage('\t') as (name:chararray,
> score:float);
>
> data_all = group data all;
>
> avg_score = foreach data_all generate AVG(data.score) as avg_score;
>
> data_avg = cross data, avg_score;
>
> describe data_avg;
>
> above_avg = filter data_avg by score > avg_score;
>
>
> Is there a better or more acceptable way to make avg_score accessible
> during the filter step, other than doing a cross?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB