Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - ordering tuple after grouping


Copy link to this message
-
Re: ordering tuple after grouping
Gianmarco De Francisci Mo... 2012-04-17, 10:52
I see, I hadn't got your suggestion.
You meant replacing both ORDER and LIMIT with TOP.
Makes sense, thanks.

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 11:50, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Top doesn't need to sort the whole relation; it can be done in a streaming
> fashion over any collection (n log k, where k << n). Plus it's algebraic
> (associative), since top 10 of a set is top 10 of all the top 10s of a
> covering collection of subsets.
>
> On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
> > Hi Dmitriy,
> >
> > Can you explain which is the difference in the execution plan?
> > And if there is a performance difference, shouldn't we try to fix it?
> >
> > Cheers,
> > --
> > Gianmarco
> >
> >
> >
> > On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
> >
> >> This works, but isn't the most efficient thing in the world.
> >> Try using the TOP udf instead.
> >> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html
> >>
> >> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
> >> <[EMAIL PROTECTED]> wrote:
> >>> Or even:
> >>>
> >>> ordered = foreach (group data by $0) { sorted = order data by $1; first
> >> = limit sorted 1; generate first; }
> >>>
> >>>
> >>> Russell Jurney http://datasyndrome.com
> >>>
> >>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Dear Gianmarco,
> >>>>
> >>>> It works great! Thanks.
> >>>>
> >>>> Tim
> >>>> ________________________________________
> >>>> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]]
> >>>> Sent: Monday, April 16, 2012 1:43 PM
> >>>> To: [EMAIL PROTECTED]
> >>>> Subject: Re: ordering tuple after grouping
> >>>>
> >>>> Sure,
> >>>> use a nested foreach.
> >>>>
> >>>> grouped = group data by $0;
> >>>> ordered = foreach grouped {
> >>>> sorted = order data by $1;
> >>>> first = limit sorted 1;
> >>>> generate first;
> >>>> }
> >>>>
> >>>> Beware, untested code.
> >>>>
> >>>> Cheers,
> >>>> --
> >>>> Gianmarco
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> Given data:
> >>>>>
> >>>>> (1, 55, abc)
> >>>>> (2, 23, asd)
> >>>>> (1, 85, xyz)
> >>>>> (1, 2, aaa)
> >>>>>
> >>>>>
> >>>>> I would like to group on $0 and then have my grouped tuple be ordered
> >> by
> >>>>> $1. Is this possible?
> >>>>>
> >>>>> The output should look like this:
> >>>>>
> >>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
> >>>>> (2, {(2,23,asd)})
> >>>>>
> >>>>>
> >>>>> Then I would like to keep the first tuple for every group.
> >>>>>
> >>>>> For example:
> >>>>>
> >>>>> (1,2,aaa)
> >>>>> (2,23,asd)
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
>