Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> ordering tuple after grouping


Copy link to this message
-
Re: ordering tuple after grouping
I see, I hadn't got your suggestion.
You meant replacing both ORDER and LIMIT with TOP.
Makes sense, thanks.

Cheers,
--
Gianmarco

On Tue, Apr 17, 2012 at 11:50, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Top doesn't need to sort the whole relation; it can be done in a streaming
> fashion over any collection (n log k, where k << n). Plus it's algebraic
> (associative), since top 10 of a set is top 10 of all the top 10s of a
> covering collection of subsets.
>
> On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
> > Hi Dmitriy,
> >
> > Can you explain which is the difference in the execution plan?
> > And if there is a performance difference, shouldn't we try to fix it?
> >
> > Cheers,
> > --
> > Gianmarco
> >
> >
> >
> > On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
> >
> >> This works, but isn't the most efficient thing in the world.
> >> Try using the TOP udf instead.
> >> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html
> >>
> >> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
> >> <[EMAIL PROTECTED]> wrote:
> >>> Or even:
> >>>
> >>> ordered = foreach (group data by $0) { sorted = order data by $1; first
> >> = limit sorted 1; generate first; }
> >>>
> >>>
> >>> Russell Jurney http://datasyndrome.com
> >>>
> >>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Dear Gianmarco,
> >>>>
> >>>> It works great! Thanks.
> >>>>
> >>>> Tim
> >>>> ________________________________________
> >>>> From: Gianmarco De Francisci Morales [[EMAIL PROTECTED]]
> >>>> Sent: Monday, April 16, 2012 1:43 PM
> >>>> To: [EMAIL PROTECTED]
> >>>> Subject: Re: ordering tuple after grouping
> >>>>
> >>>> Sure,
> >>>> use a nested foreach.
> >>>>
> >>>> grouped = group data by $0;
> >>>> ordered = foreach grouped {
> >>>> sorted = order data by $1;
> >>>> first = limit sorted 1;
> >>>> generate first;
> >>>> }
> >>>>
> >>>> Beware, untested code.
> >>>>
> >>>> Cheers,
> >>>> --
> >>>> Gianmarco
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> Given data:
> >>>>>
> >>>>> (1, 55, abc)
> >>>>> (2, 23, asd)
> >>>>> (1, 85, xyz)
> >>>>> (1, 2, aaa)
> >>>>>
> >>>>>
> >>>>> I would like to group on $0 and then have my grouped tuple be ordered
> >> by
> >>>>> $1. Is this possible?
> >>>>>
> >>>>> The output should look like this:
> >>>>>
> >>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
> >>>>> (2, {(2,23,asd)})
> >>>>>
> >>>>>
> >>>>> Then I would like to keep the first tuple for every group.
> >>>>>
> >>>>> For example:
> >>>>>
> >>>>> (1,2,aaa)
> >>>>> (2,23,asd)
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB