Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Group by Fetching top 100 from each group


Copy link to this message
-
Re: Group by Fetching top 100 from each group
Ideally, you should use the TOP function. It will be more efficient, as it
is algebraic.

2012/6/29 Kris Coward <[EMAIL PROTECTED]>

>
> LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
> These should be able to do what you want.
>
> e.g.
>
> B = GROUP A BY key
> C = FOREACH B {
>    X = ORDER A BY orderingParam;
>    Y = LIMIT X 100;
>    GENERATE group, Y;}
>
> -Kris
>
> On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juhn wrote:
> > Hi there,
> >
> > I'm trying to write a group by statement, only returning the top 100
> records from each group.  Does pig support this?
> >
> > Thanks,
> > Ben
>
> --
> Kris Coward                                     http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
>