Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> nested order limit by percentage of overall records


Copy link to this message
-
Re: nested order limit by percentage of overall records
You should check out the quantile libraries in LinkedIn's DataFu UDFs:
https://github.com/linkedin/datafu specifically
https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/stats/Quantile.javafor
relatively small inputs, and
https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/stats/StreamingQuantile.javafor
larger inputs.

You can use this to receive the top x% for any given field and then you can
use that within a filter
On Mon, Mar 18, 2013 at 6:23 AM, Marco Cadetg <[EMAIL PROTECTED]> wrote:

> Hi there,
>
> I would like to do something very similar to a nested foreach with using
> order by and then limit. But I would like to limit on a relation to the
> total number of records.
>
> users = load 'users' as (userid:chararray, money:long, region:chararray);
> grouped_region = group users by region;
> top_10_percent = foreach grouped_region {
>             sorted = order users by money desc;
>             top    = limit sorted $UKNOWN_HOWTO_SET; -- e.g. for the top
> 10% it would be total users/10 in that region.
>             generate group, flatten(top);
> };
>
> Thanks a lot for any help on this.
>
> Cheers,
> -Marco
>

--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB