Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to LIMIT a relation by percentage


Copy link to this message
-
Re: How to LIMIT a relation by percentage
Hi Dmitriy -- great info, thanks.

On Thu, Sep 8, 2011 at 12:19 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> You could also do it with TOP as Norbert suggests, but that has a bit of
> extra cost due to the sort TOP does.

Just for my understanding, doesn't the ORDER BY in the PIG-1926
example impose the same sort cost?  Seems that you'd have pay for a
sort as long as the requirement is top N.

Norbert

> On Thu, Sep 8, 2011 at 6:42 AM, Norbert Burger <[EMAIL PROTECTED]>wrote:
>
>> Hi Ruslan -- no need to write your own UDF.  There is a built-in
>> function TOP() which will extract for you the top N tuples of a
>> relation, where N is a configurable parameter.  Take a look at:
>>
>> http://pig.apache.org/docs/r0.9.0/func.html#topx
>>
>> Norbert
>>
>> On Thu, Sep 8, 2011 at 9:13 AM, Ruslan Al-Fakikh
>> <[EMAIL PROTECTED]> wrote:
>> > Hey guys,
>> >
>> > How can I LIMIT a relation by percentage?
>> > What I need is to sort a relation by a numeric column and then take
>> > top 5% of tuples.
>> > As far as I understand I cannot use an expression in the LIMIT
>> > operator. Do I have to write my own UDF? What type of UDF should I use
>> > then?
>> >
>> > --
>> > Best Regards,
>> > Ruslan Al-Fakikh
>> >
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB