Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> nested order limit by percentage of overall records

Copy link to this message
Re: nested order limit by percentage of overall records
You should check out the quantile libraries in LinkedIn's DataFu UDFs:
https://github.com/linkedin/datafu specifically
relatively small inputs, and
larger inputs.

You can use this to receive the top x% for any given field and then you can
use that within a filter
On Mon, Mar 18, 2013 at 6:23 AM, Marco Cadetg <[EMAIL PROTECTED]> wrote:

> Hi there,
> I would like to do something very similar to a nested foreach with using
> order by and then limit. But I would like to limit on a relation to the
> total number of records.
> users = load 'users' as (userid:chararray, money:long, region:chararray);
> grouped_region = group users by region;
> top_10_percent = foreach grouped_region {
>             sorted = order users by money desc;
>             top    = limit sorted $UKNOWN_HOWTO_SET; -- e.g. for the top
> 10% it would be total users/10 in that region.
>             generate group, flatten(top);
> };
> Thanks a lot for any help on this.
> Cheers,
> -Marco

Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248