Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Group by Fetching top 100 from each group


Copy link to this message
-
Re: Group by Fetching top 100 from each group

Yes, that is indeed better.

On Fri, Jun 29, 2012 at 06:39:58PM -0700, Jonathan Coveney wrote:
> Ideally, you should use the TOP function. It will be more efficient, as it
> is algebraic.
>
> 2012/6/29 Kris Coward <[EMAIL PROTECTED]>
>
> >
> > LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
> > These should be able to do what you want.
> >
> > e.g.
> >
> > B = GROUP A BY key
> > C = FOREACH B {
> >    X = ORDER A BY orderingParam;
> >    Y = LIMIT X 100;
> >    GENERATE group, Y;}
> >
> > -Kris
> >
> > On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juhn wrote:
> > > Hi there,
> > >
> > > I'm trying to write a group by statement, only returning the top 100
> > records from each group.  Does pig support this?
> > >
> > > Thanks,
> > > Ben
> >
> > --
> > Kris Coward                                     http://unripe.melon.org/
> > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> >

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3