Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Group by Fetching top 100 from each group


+
Benjamin Juhn 2012-06-29, 23:19
+
Sal Uryasev 2012-06-29, 23:27
+
Corbin Hoenes 2012-06-30, 02:44
Copy link to this message
-
Re: Group by Fetching top 100 from each group

LIMIT and ORDER BY are both allowed nested ops for a FOREACH statement.
These should be able to do what you want.

e.g.

B = GROUP A BY key
C = FOREACH B {
    X = ORDER A BY orderingParam;
    Y = LIMIT X 100;
    GENERATE group, Y;}

-Kris

On Fri, Jun 29, 2012 at 04:19:18PM -0700, Benjamin Juhn wrote:
> Hi there,
>
> I'm trying to write a group by statement, only returning the top 100 records from each group.  Does pig support this?
>
> Thanks,
> Ben

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
+
Jonathan Coveney 2012-06-30, 01:39
+
Kris Coward 2012-06-30, 04:47
+
Austin Stickney 2012-06-29, 23:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB