Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - ordering tuple after grouping


Copy link to this message
-
Re: ordering tuple after grouping
Gianmarco De Francisci Mo... 2012-04-17, 08:03
Hi Dmitriy,

Can you explain which is the difference in the execution plan?
And if there is a performance difference, shouldn't we try to fix it?

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> This works, but isn't the most efficient thing in the world.
> Try using the TOP udf instead.
> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html
>
> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
> <[EMAIL PROTECTED]> wrote:
> > Or even:
> >
> > ordered = foreach (group data by $0) { sorted = order data by $1; first
> = limit sorted 1; generate first; }
> >
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote:
> >
> >> Dear Gianmarco,
> >>
> >> It works great! Thanks.
> >>
> >> Tim
> >> ________________________________________
> >> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]]
> >> Sent: Monday, April 16, 2012 1:43 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: ordering tuple after grouping
> >>
> >> Sure,
> >> use a nested foreach.
> >>
> >> grouped = group data by $0;
> >> ordered = foreach grouped {
> >>  sorted = order data by $1;
> >>  first = limit sorted 1;
> >>  generate first;
> >> }
> >>
> >> Beware, untested code.
> >>
> >> Cheers,
> >> --
> >> Gianmarco
> >>
> >>
> >>
> >> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote:
> >>
> >>> Given data:
> >>>
> >>> (1, 55, abc)
> >>> (2, 23, asd)
> >>> (1, 85, xyz)
> >>> (1, 2, aaa)
> >>>
> >>>
> >>> I would like to group on $0 and then have my grouped tuple be ordered
> by
> >>> $1. Is this possible?
> >>>
> >>> The output should look like this:
> >>>
> >>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
> >>> (2, {(2,23,asd)})
> >>>
> >>>
> >>> Then I would like to keep the first tuple for every group.
> >>>
> >>> For example:
> >>>
> >>> (1,2,aaa)
> >>> (2,23,asd)
> >>>
> >>>
> >>>
> >>
>