Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to LIMIT a relation by percentage


Copy link to this message
-
Re: How to LIMIT a relation by percentage
Hi Ruslan -- no need to write your own UDF.  There is a built-in
function TOP() which will extract for you the top N tuples of a
relation, where N is a configurable parameter.  Take a look at:

http://pig.apache.org/docs/r0.9.0/func.html#topx

Norbert

On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh
<[EMAIL PROTECTED]> wrote:
> Hey guys,
>
> How can I LIMIT a relation by percentage?
> What I need is to sort a relation by a numeric column and then take
> top 5% of tuples.
> As far as I understand I cannot use an expression in the LIMIT
> operator. Do I have to write my own UDF? What type of UDF should I use
> then?
>
> --
> Best Regards,
> Ruslan Al-Fakikh
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB