Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Group by Fetching top 100 from each group


Copy link to this message
-
Re: Group by Fetching top 100 from each group
Kris Coward 2012-06-30, 00:02

LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
These should be able to do what you want.

e.g.

B = GROUP A BY key
C = FOREACH B {
    X = ORDER A BY orderingParam;
    Y = LIMIT X 100;
    GENERATE group, Y;}

-Kris

On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juhn wrote:
> Hi there,
>
> I'm trying to write a group by statement, only returning the top 100 records from each group.  Does pig support this?
>
> Thanks,
> Ben

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3