Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Ordering and limiting Tuples inside a Bag


+
James Newhaven 2012-05-09, 11:56
+
Steve Bernstein 2012-05-09, 15:55
+
James Newhaven 2012-05-09, 16:33
+
James Newhaven 2012-05-09, 17:39
Copy link to this message
-
Re: Ordering and limiting Tuples inside a Bag
You might want to use the TOP UDF which is more efficient for the same task
(as I was taught on this list :).
http://pig.apache.org/docs/r0.10.0/func.html#topx

Cheers,
--
Gianmarco
On Wed, May 9, 2012 at 7:39 PM, James Newhaven <[EMAIL PROTECTED]>wrote:

> Ok, figured out the nested foreach. Thanks for your help.
>
> Regards,
> James
>
>
>
> On Wed, May 9, 2012 at 5:33 PM, James Newhaven <[EMAIL PROTECTED]
> >wrote:
>
> > Thanks Steve,
> >
> > Yes I did discover nested foreach, but I can't get the syntax right. Can
> > anyone help get me started on how it's meant to look?
> >
> > Regards,
> > James
> >
> >
> > On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein <
> [EMAIL PROTECTED]>wrote:
> >
> >> You can.  Check out nested Foreach, order by then limit. (see, for
> >> example,
> >> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html).
> >>
> >> _____________
> >> Steve Bernstein
> >> VP, Analytics
> >> Rearden Commerce, Inc.
> >>
> >> +1.408.499.0961 Mobile
> >>
> >> deem.com | reardencommerce.com
> >>
> >> -----Original Message-----
> >> From: James Newhaven [mailto:[EMAIL PROTECTED]]
> >> Sent: Wednesday, May 09, 2012 4:57 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Ordering and limiting Tuples inside a Bag
> >>
> >> Hi,
> >>
> >> Another newbie Pig question.
> >>
> >> If I have a relation with a structure like this: (city, { (productId,
> >> count), (product, count) }).
> >>
> >> This relation tracks counts of products for each city. So a tuple
> >> containing a city name and then a bag of products each with an inventory
> >> count.
> >>
> >> Is it possible in pig, to extract only the top 3 products with the
> >> highest counts for each city, ordered from highest to lowest?
> >>
> >> Ideally, I would like the output to be like this:
> >>
> >> (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another
> >> City, ((oranges, 52), (pears, 32), (apples, 12)))
> >>
> >> Thanks,
> >> James
> >>
> >
> >
>