|
|
-
ordering tuple after grouping
Chan, Tim 2012-04-16, 20:31
Given data:
(1, 55, abc) (2, 23, asd) (1, 85, xyz) (1, 2, aaa) I would like to group on $0 and then have my grouped tuple be ordered by $1. Is this possible?
The output should look like this:
(1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) (2, {(2,23,asd)}) Then I would like to keep the first tuple for every group.
For example:
(1,2,aaa) (2,23,asd)
-
Re: ordering tuple after grouping
Gianmarco De Francisci Mo... 2012-04-16, 20:43
Sure, use a nested foreach.
grouped = group data by $0; ordered = foreach grouped { sorted = order data by $1; first = limit sorted 1; generate first; }
Beware, untested code.
Cheers, -- Gianmarco
On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote:
> Given data: > > (1, 55, abc) > (2, 23, asd) > (1, 85, xyz) > (1, 2, aaa) > > > I would like to group on $0 and then have my grouped tuple be ordered by > $1. Is this possible? > > The output should look like this: > > (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) > (2, {(2,23,asd)}) > > > Then I would like to keep the first tuple for every group. > > For example: > > (1,2,aaa) > (2,23,asd) > > >
-
RE: ordering tuple after grouping
Chan, Tim 2012-04-16, 23:03
Dear Gianmarco,
It works great! Thanks.
Tim ________________________________________ From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]] Sent: Monday, April 16, 2012 1:43 PM To: [EMAIL PROTECTED] Subject: Re: ordering tuple after grouping
Sure, use a nested foreach.
grouped = group data by $0; ordered = foreach grouped { sorted = order data by $1; first = limit sorted 1; generate first; }
Beware, untested code.
Cheers, -- Gianmarco
On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote:
> Given data: > > (1, 55, abc) > (2, 23, asd) > (1, 85, xyz) > (1, 2, aaa) > > > I would like to group on $0 and then have my grouped tuple be ordered by > $1. Is this possible? > > The output should look like this: > > (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) > (2, {(2,23,asd)}) > > > Then I would like to keep the first tuple for every group. > > For example: > > (1,2,aaa) > (2,23,asd) > > >
-
Re: ordering tuple after grouping
Russell Jurney 2012-04-17, 00:22
Or even: ordered = foreach (group data by $0) { sorted = order data by $1; first = limit sorted 1; generate first; } Russell Jurney http://datasyndrome.comOn Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote: > Dear Gianmarco, > > It works great! Thanks. > > Tim > ________________________________________ > From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]] > Sent: Monday, April 16, 2012 1:43 PM > To: [EMAIL PROTECTED] > Subject: Re: ordering tuple after grouping > > Sure, > use a nested foreach. > > grouped = group data by $0; > ordered = foreach grouped { > sorted = order data by $1; > first = limit sorted 1; > generate first; > } > > Beware, untested code. > > Cheers, > -- > Gianmarco > > > > On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote: > >> Given data: >> >> (1, 55, abc) >> (2, 23, asd) >> (1, 85, xyz) >> (1, 2, aaa) >> >> >> I would like to group on $0 and then have my grouped tuple be ordered by >> $1. Is this possible? >> >> The output should look like this: >> >> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) >> (2, {(2,23,asd)}) >> >> >> Then I would like to keep the first tuple for every group. >> >> For example: >> >> (1,2,aaa) >> (2,23,asd) >> >> >> >
-
Re: ordering tuple after grouping
Dmitriy Ryaboy 2012-04-17, 07:47
This works, but isn't the most efficient thing in the world. Try using the TOP udf instead. http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.htmlOn Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney <[EMAIL PROTECTED]> wrote: > Or even: > > ordered = foreach (group data by $0) { sorted = order data by $1; first = limit sorted 1; generate first; } > > > Russell Jurney http://datasyndrome.com> > On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote: > >> Dear Gianmarco, >> >> It works great! Thanks. >> >> Tim >> ________________________________________ >> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]] >> Sent: Monday, April 16, 2012 1:43 PM >> To: [EMAIL PROTECTED] >> Subject: Re: ordering tuple after grouping >> >> Sure, >> use a nested foreach. >> >> grouped = group data by $0; >> ordered = foreach grouped { >> sorted = order data by $1; >> first = limit sorted 1; >> generate first; >> } >> >> Beware, untested code. >> >> Cheers, >> -- >> Gianmarco >> >> >> >> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote: >> >>> Given data: >>> >>> (1, 55, abc) >>> (2, 23, asd) >>> (1, 85, xyz) >>> (1, 2, aaa) >>> >>> >>> I would like to group on $0 and then have my grouped tuple be ordered by >>> $1. Is this possible? >>> >>> The output should look like this: >>> >>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) >>> (2, {(2,23,asd)}) >>> >>> >>> Then I would like to keep the first tuple for every group. >>> >>> For example: >>> >>> (1,2,aaa) >>> (2,23,asd) >>> >>> >>> >>
-
Re: ordering tuple after grouping
Gianmarco De Francisci Mo... 2012-04-17, 08:03
Hi Dmitriy, Can you explain which is the difference in the execution plan? And if there is a performance difference, shouldn't we try to fix it? Cheers, -- Gianmarco On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > This works, but isn't the most efficient thing in the world. > Try using the TOP udf instead. > http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html> > On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney > <[EMAIL PROTECTED]> wrote: > > Or even: > > > > ordered = foreach (group data by $0) { sorted = order data by $1; first > = limit sorted 1; generate first; } > > > > > > Russell Jurney http://datasyndrome.com> > > > On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote: > > > >> Dear Gianmarco, > >> > >> It works great! Thanks. > >> > >> Tim > >> ________________________________________ > >> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]] > >> Sent: Monday, April 16, 2012 1:43 PM > >> To: [EMAIL PROTECTED] > >> Subject: Re: ordering tuple after grouping > >> > >> Sure, > >> use a nested foreach. > >> > >> grouped = group data by $0; > >> ordered = foreach grouped { > >> sorted = order data by $1; > >> first = limit sorted 1; > >> generate first; > >> } > >> > >> Beware, untested code. > >> > >> Cheers, > >> -- > >> Gianmarco > >> > >> > >> > >> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote: > >> > >>> Given data: > >>> > >>> (1, 55, abc) > >>> (2, 23, asd) > >>> (1, 85, xyz) > >>> (1, 2, aaa) > >>> > >>> > >>> I would like to group on $0 and then have my grouped tuple be ordered > by > >>> $1. Is this possible? > >>> > >>> The output should look like this: > >>> > >>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) > >>> (2, {(2,23,asd)}) > >>> > >>> > >>> Then I would like to keep the first tuple for every group. > >>> > >>> For example: > >>> > >>> (1,2,aaa) > >>> (2,23,asd) > >>> > >>> > >>> > >> >
-
Re: ordering tuple after grouping
Dmitriy Ryaboy 2012-04-17, 09:50
Top doesn't need to sort the whole relation; it can be done in a streaming fashion over any collection (n log k, where k << n). Plus it's algebraic (associative), since top 10 of a set is top 10 of all the top 10s of a covering collection of subsets. On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <[EMAIL PROTECTED]> wrote: > Hi Dmitriy, > > Can you explain which is the difference in the execution plan? > And if there is a performance difference, shouldn't we try to fix it? > > Cheers, > -- > Gianmarco > > > > On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> This works, but isn't the most efficient thing in the world. >> Try using the TOP udf instead. >> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html>> >> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney >> <[EMAIL PROTECTED]> wrote: >>> Or even: >>> >>> ordered = foreach (group data by $0) { sorted = order data by $1; first >> = limit sorted 1; generate first; } >>> >>> >>> Russell Jurney http://datasyndrome.com>>> >>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote: >>> >>>> Dear Gianmarco, >>>> >>>> It works great! Thanks. >>>> >>>> Tim >>>> ________________________________________ >>>> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]] >>>> Sent: Monday, April 16, 2012 1:43 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: Re: ordering tuple after grouping >>>> >>>> Sure, >>>> use a nested foreach. >>>> >>>> grouped = group data by $0; >>>> ordered = foreach grouped { >>>> sorted = order data by $1; >>>> first = limit sorted 1; >>>> generate first; >>>> } >>>> >>>> Beware, untested code. >>>> >>>> Cheers, >>>> -- >>>> Gianmarco >>>> >>>> >>>> >>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote: >>>> >>>>> Given data: >>>>> >>>>> (1, 55, abc) >>>>> (2, 23, asd) >>>>> (1, 85, xyz) >>>>> (1, 2, aaa) >>>>> >>>>> >>>>> I would like to group on $0 and then have my grouped tuple be ordered >> by >>>>> $1. Is this possible? >>>>> >>>>> The output should look like this: >>>>> >>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) >>>>> (2, {(2,23,asd)}) >>>>> >>>>> >>>>> Then I would like to keep the first tuple for every group. >>>>> >>>>> For example: >>>>> >>>>> (1,2,aaa) >>>>> (2,23,asd) >>>>> >>>>> >>>>> >>>> >>
-
Re: ordering tuple after grouping
Gianmarco De Francisci Mo... 2012-04-17, 10:52
I see, I hadn't got your suggestion. You meant replacing both ORDER and LIMIT with TOP. Makes sense, thanks. Cheers, -- Gianmarco On Tue, Apr 17, 2012 at 11:50, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Top doesn't need to sort the whole relation; it can be done in a streaming > fashion over any collection (n log k, where k << n). Plus it's algebraic > (associative), since top 10 of a set is top 10 of all the top 10s of a > covering collection of subsets. > > On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales < > [EMAIL PROTECTED]> wrote: > > > Hi Dmitriy, > > > > Can you explain which is the difference in the execution plan? > > And if there is a performance difference, shouldn't we try to fix it? > > > > Cheers, > > -- > > Gianmarco > > > > > > > > On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > >> This works, but isn't the most efficient thing in the world. > >> Try using the TOP udf instead. > >> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html> >> > >> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney > >> <[EMAIL PROTECTED]> wrote: > >>> Or even: > >>> > >>> ordered = foreach (group data by $0) { sorted = order data by $1; first > >> = limit sorted 1; generate first; } > >>> > >>> > >>> Russell Jurney http://datasyndrome.com> >>> > >>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote: > >>> > >>>> Dear Gianmarco, > >>>> > >>>> It works great! Thanks. > >>>> > >>>> Tim > >>>> ________________________________________ > >>>> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]] > >>>> Sent: Monday, April 16, 2012 1:43 PM > >>>> To: [EMAIL PROTECTED] > >>>> Subject: Re: ordering tuple after grouping > >>>> > >>>> Sure, > >>>> use a nested foreach. > >>>> > >>>> grouped = group data by $0; > >>>> ordered = foreach grouped { > >>>> sorted = order data by $1; > >>>> first = limit sorted 1; > >>>> generate first; > >>>> } > >>>> > >>>> Beware, untested code. > >>>> > >>>> Cheers, > >>>> -- > >>>> Gianmarco > >>>> > >>>> > >>>> > >>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> Given data: > >>>>> > >>>>> (1, 55, abc) > >>>>> (2, 23, asd) > >>>>> (1, 85, xyz) > >>>>> (1, 2, aaa) > >>>>> > >>>>> > >>>>> I would like to group on $0 and then have my grouped tuple be ordered > >> by > >>>>> $1. Is this possible? > >>>>> > >>>>> The output should look like this: > >>>>> > >>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) > >>>>> (2, {(2,23,asd)}) > >>>>> > >>>>> > >>>>> Then I would like to keep the first tuple for every group. > >>>>> > >>>>> For example: > >>>>> > >>>>> (1,2,aaa) > >>>>> (2,23,asd) > >>>>> > >>>>> > >>>>> > >>>> > >> >
|
|