Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> ordering tuple after grouping


+
Chan, Tim 2012-04-16, 20:31
+
Gianmarco De Francisci Mo... 2012-04-16, 20:43
+
Chan, Tim 2012-04-16, 23:03
+
Russell Jurney 2012-04-17, 00:22
+
Dmitriy Ryaboy 2012-04-17, 07:47
Copy link to this message
-
Re: ordering tuple after grouping
Hi Dmitriy,

Can you explain which is the difference in the execution plan?
And if there is a performance difference, shouldn't we try to fix it?

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> This works, but isn't the most efficient thing in the world.
> Try using the TOP udf instead.
> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html
>
> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
> <[EMAIL PROTECTED]> wrote:
> > Or even:
> >
> > ordered = foreach (group data by $0) { sorted = order data by $1; first
> = limit sorted 1; generate first; }
> >
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote:
> >
> >> Dear Gianmarco,
> >>
> >> It works great! Thanks.
> >>
> >> Tim
> >> ________________________________________
> >> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]]
> >> Sent: Monday, April 16, 2012 1:43 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: ordering tuple after grouping
> >>
> >> Sure,
> >> use a nested foreach.
> >>
> >> grouped = group data by $0;
> >> ordered = foreach grouped {
> >>  sorted = order data by $1;
> >>  first = limit sorted 1;
> >>  generate first;
> >> }
> >>
> >> Beware, untested code.
> >>
> >> Cheers,
> >> --
> >> Gianmarco
> >>
> >>
> >>
> >> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote:
> >>
> >>> Given data:
> >>>
> >>> (1, 55, abc)
> >>> (2, 23, asd)
> >>> (1, 85, xyz)
> >>> (1, 2, aaa)
> >>>
> >>>
> >>> I would like to group on $0 and then have my grouped tuple be ordered
> by
> >>> $1. Is this possible?
> >>>
> >>> The output should look like this:
> >>>
> >>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
> >>> (2, {(2,23,asd)})
> >>>
> >>>
> >>> Then I would like to keep the first tuple for every group.
> >>>
> >>> For example:
> >>>
> >>> (1,2,aaa)
> >>> (2,23,asd)
> >>>
> >>>
> >>>
> >>
>
+
Dmitriy Ryaboy 2012-04-17, 09:50
+
Gianmarco De Francisci Mo... 2012-04-17, 10:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB