|
|
-
Ordering and limiting Tuples inside a Bag
James Newhaven 2012-05-09, 11:56
Hi,
Another newbie Pig question.
If I have a relation with a structure like this: (city, { (productId, count), (product, count) }).
This relation tracks counts of products for each city. So a tuple containing a city name and then a bag of products each with an inventory count.
Is it possible in pig, to extract only the top 3 products with the highest counts for each city, ordered from highest to lowest?
Ideally, I would like the output to be like this:
(New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another City, ((oranges, 52), (pears, 32), (apples, 12)))
Thanks, James
-
RE: Ordering and limiting Tuples inside a Bag
Steve Bernstein 2012-05-09, 15:55
You can. Check out nested Foreach, order by then limit. (see, for example, http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html). _____________ Steve Bernstein VP, Analytics Rearden Commerce, Inc. +1.408.499.0961 Mobile deem.com | reardencommerce.com -----Original Message----- From: James Newhaven [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 09, 2012 4:57 AM To: [EMAIL PROTECTED] Subject: Ordering and limiting Tuples inside a Bag Hi, Another newbie Pig question. If I have a relation with a structure like this: (city, { (productId, count), (product, count) }). This relation tracks counts of products for each city. So a tuple containing a city name and then a bag of products each with an inventory count. Is it possible in pig, to extract only the top 3 products with the highest counts for each city, ordered from highest to lowest? Ideally, I would like the output to be like this: (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another City, ((oranges, 52), (pears, 32), (apples, 12))) Thanks, James
-
Re: Ordering and limiting Tuples inside a Bag
James Newhaven 2012-05-09, 16:33
Thanks Steve, Yes I did discover nested foreach, but I can't get the syntax right. Can anyone help get me started on how it's meant to look? Regards, James On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein <[EMAIL PROTECTED]>wrote: > You can. Check out nested Foreach, order by then limit. (see, for > example, > http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html). > > _____________ > Steve Bernstein > VP, Analytics > Rearden Commerce, Inc. > > +1.408.499.0961 Mobile > > deem.com | reardencommerce.com > > -----Original Message----- > From: James Newhaven [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, May 09, 2012 4:57 AM > To: [EMAIL PROTECTED] > Subject: Ordering and limiting Tuples inside a Bag > > Hi, > > Another newbie Pig question. > > If I have a relation with a structure like this: (city, { (productId, > count), (product, count) }). > > This relation tracks counts of products for each city. So a tuple > containing a city name and then a bag of products each with an inventory > count. > > Is it possible in pig, to extract only the top 3 products with the highest > counts for each city, ordered from highest to lowest? > > Ideally, I would like the output to be like this: > > (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another City, > ((oranges, 52), (pears, 32), (apples, 12))) > > Thanks, > James >
-
Re: Ordering and limiting Tuples inside a Bag
James Newhaven 2012-05-09, 17:39
Ok, figured out the nested foreach. Thanks for your help. Regards, James On Wed, May 9, 2012 at 5:33 PM, James Newhaven <[EMAIL PROTECTED]>wrote: > Thanks Steve, > > Yes I did discover nested foreach, but I can't get the syntax right. Can > anyone help get me started on how it's meant to look? > > Regards, > James > > > On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein <[EMAIL PROTECTED]>wrote: > >> You can. Check out nested Foreach, order by then limit. (see, for >> example, >> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html). >> >> _____________ >> Steve Bernstein >> VP, Analytics >> Rearden Commerce, Inc. >> >> +1.408.499.0961 Mobile >> >> deem.com | reardencommerce.com >> >> -----Original Message----- >> From: James Newhaven [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, May 09, 2012 4:57 AM >> To: [EMAIL PROTECTED] >> Subject: Ordering and limiting Tuples inside a Bag >> >> Hi, >> >> Another newbie Pig question. >> >> If I have a relation with a structure like this: (city, { (productId, >> count), (product, count) }). >> >> This relation tracks counts of products for each city. So a tuple >> containing a city name and then a bag of products each with an inventory >> count. >> >> Is it possible in pig, to extract only the top 3 products with the >> highest counts for each city, ordered from highest to lowest? >> >> Ideally, I would like the output to be like this: >> >> (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another >> City, ((oranges, 52), (pears, 32), (apples, 12))) >> >> Thanks, >> James >> > >
-
Re: Ordering and limiting Tuples inside a Bag
Gianmarco De Francisci Mo... 2012-05-09, 21:06
You might want to use the TOP UDF which is more efficient for the same task (as I was taught on this list :). http://pig.apache.org/docs/r0.10.0/func.html#topxCheers, -- Gianmarco On Wed, May 9, 2012 at 7:39 PM, James Newhaven <[EMAIL PROTECTED]>wrote: > Ok, figured out the nested foreach. Thanks for your help. > > Regards, > James > > > > On Wed, May 9, 2012 at 5:33 PM, James Newhaven <[EMAIL PROTECTED] > >wrote: > > > Thanks Steve, > > > > Yes I did discover nested foreach, but I can't get the syntax right. Can > > anyone help get me started on how it's meant to look? > > > > Regards, > > James > > > > > > On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein < > [EMAIL PROTECTED]>wrote: > > > >> You can. Check out nested Foreach, order by then limit. (see, for > >> example, > >> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html). > >> > >> _____________ > >> Steve Bernstein > >> VP, Analytics > >> Rearden Commerce, Inc. > >> > >> +1.408.499.0961 Mobile > >> > >> deem.com | reardencommerce.com > >> > >> -----Original Message----- > >> From: James Newhaven [mailto:[EMAIL PROTECTED]] > >> Sent: Wednesday, May 09, 2012 4:57 AM > >> To: [EMAIL PROTECTED] > >> Subject: Ordering and limiting Tuples inside a Bag > >> > >> Hi, > >> > >> Another newbie Pig question. > >> > >> If I have a relation with a structure like this: (city, { (productId, > >> count), (product, count) }). > >> > >> This relation tracks counts of products for each city. So a tuple > >> containing a city name and then a bag of products each with an inventory > >> count. > >> > >> Is it possible in pig, to extract only the top 3 products with the > >> highest counts for each city, ordered from highest to lowest? > >> > >> Ideally, I would like the output to be like this: > >> > >> (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another > >> City, ((oranges, 52), (pears, 32), (apples, 12))) > >> > >> Thanks, > >> James > >> > > > > >
|
|