Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Group by Fetching top 100 from each group


+
Benjamin Juhn 2012-06-29, 23:19
+
Sal Uryasev 2012-06-29, 23:27
+
Corbin Hoenes 2012-06-30, 02:44
Copy link to this message
-
Re: Group by Fetching top 100 from each group

LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
These should be able to do what you want.

e.g.

B = GROUP A BY key
C = FOREACH B {
    X = ORDER A BY orderingParam;
    Y = LIMIT X 100;
    GENERATE group, Y;}

-Kris

On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juhn wrote:
> Hi there,
>
> I'm trying to write a group by statement, only returning the top 100 records from each group.  Does pig support this?
>
> Thanks,
> Ben

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
+
Jonathan Coveney 2012-06-30, 01:39
+
Kris Coward 2012-06-30, 04:47
+
Austin Stickney 2012-06-29, 23:55