|
|
-
Get field from bag with constraints from same relation
Thomas Bach 2013-01-22, 11:55
Hi there,
I have the following data
4 {(1,abc),(2,cde),(5,efg)} 2 {(1,foo),(2,bar),(5,baz)} 7 {(1,bounce),(2,frotz),(5,trotz)}
what I finally want to achieve is a list of all strings related to the largest number in the tuple that is less-equal the first number in the row. i.e.:
(4,cde) (2,bar) (5,trotz)
I was thinking about filtering the data first in a foreach and then extract the max of the resulting list (probably in the same foreach block). But, I'm already stuck at:
data = load 'example_data' as (n: int, B: bag{(m: int, s: chararray)}); X = foreach data { A = filter B by m <= n; GENERATE n, A; }
ERROR 1025: <line 14, column 23> Invalid field projection. Projected field [n] does not exist in schema: m:int,s:chararray.
Any help highly appreciated.
Regards, Thomas Bach.
-
Re: Get field from bag with constraints from same relation
Thomas Bach 2013-01-22, 16:24
On Tue, Jan 22, 2013 at 12:55:22PM +0100, Thomas Bach wrote: > Hi there, > > I have the following data > > 4 {(1,abc),(2,cde),(5,efg)} > 2 {(1,foo),(2,bar),(5,baz)} > 7 {(1,bounce),(2,frotz),(5,trotz)} > > what I finally want to achieve is a list of all strings related to the > largest number in the tuple that is less-equal the first number in > the row. i.e.: > > (4,cde) > (2,bar) > (5,trotz) >
This should be
(4,cde) (2,bar) (7,trotz)
of course.
Regards, Thomas Bach.
-
Re: Get field from bag with constraints from same relation
Cheolsoo Park 2013-01-22, 19:31
Hi Thomas,
Try this:
data1 = LOAD '1.txt' USING PigStorage('|') AS (n:int, B:bag{(m:int,s:chararray)}); data2 = FOREACH data1 GENERATE n, FLATTEN(B); data3 = FILTER data2 BY B::m <= n; data4 = GROUP data3 BY n; data5 = FOREACH data4 { data6 = ORDER data3 BY B::m DESC; data7 = LIMIT data6 1; GENERATE data7; } data8 = FOREACH data5 GENERATE FLATTEN(data7); data9 = FOREACH data8 GENERATE n, B::s; DUMP data9;
The input is: 4|{(1,abc),(2,cde),(5,efg)} 2|{(1,foo),(2,bar),(5,baz)} 7|{(1,bounce),(2,frotz),(5,trotz)}
The output is: (2,bar) (4,cde) (7,trotz)
Thanks, Cheolsoo On Tue, Jan 22, 2013 at 8:24 AM, Thomas Bach <[EMAIL PROTECTED]>wrote:
> On Tue, Jan 22, 2013 at 12:55:22PM +0100, Thomas Bach wrote: > > Hi there, > > > > I have the following data > > > > 4 {(1,abc),(2,cde),(5,efg)} > > 2 {(1,foo),(2,bar),(5,baz)} > > 7 {(1,bounce),(2,frotz),(5,trotz)} > > > > what I finally want to achieve is a list of all strings related to the > > largest number in the tuple that is less-equal the first number in > > the row. i.e.: > > > > (4,cde) > > (2,bar) > > (5,trotz) > > > > This should be > > (4,cde) > (2,bar) > (7,trotz) > > of course. > > Regards, > Thomas Bach. >
-
Re: Get field from bag with constraints from same relation
Thomas Bach 2013-01-23, 14:32
On Tue, Jan 22, 2013 at 11:31:23AM -0800, Cheolsoo Park wrote: > > Try this: > > data1 = LOAD '1.txt' USING PigStorage('|') AS (n:int, > B:bag{(m:int,s:chararray)}); > data2 = FOREACH data1 GENERATE n, FLATTEN(B); > data3 = FILTER data2 BY B::m <= n; > data4 = GROUP data3 BY n; > data5 = FOREACH data4 { > data6 = ORDER data3 BY B::m DESC; > data7 = LIMIT data6 1; > GENERATE data7; > } > data8 = FOREACH data5 GENERATE FLATTEN(data7); > data9 = FOREACH data8 GENERATE n, B::s; > DUMP data9; > > The input is: > 4|{(1,abc),(2,cde),(5,efg)} > 2|{(1,foo),(2,bar),(5,baz)} > 7|{(1,bounce),(2,frotz),(5,trotz)} > > The output is: > (2,bar) > (4,cde) > (7,trotz)
It's much more complicated than I thought. :/
But, it works like a charm. Thank you! :)
Regards, Thomas Bach.
|
|