|
|
-
Cross Product of Two Tuples?
Eli Finkelshteyn 2012-04-04, 18:18
Hi Folks, I'm currently trying to do something I figured would be trivial, but actually wound up being a bit of work for me, so I'm wondering if I'm missing something. All I want to do is get a cross product of two tuples. So for example, given an input of:
('hello', 'howdy', 'hi'), ('hola', 'bonjour')
I'd get:
('hello', 'hola') ('hello', 'bonjour') ('howdy', 'hola') ('howdy', 'bonjour') ('hi', 'hola') ('hi', 'bonjour')
At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no good cause the tuples are first themselves put into new tuples. So, what I'm left with no is writing a dirty and slow python udf for this. Is there really no better way to do this? I'd think it would be a pretty standard task.
Eli
-
Re: Cross Product of Two Tuples?
Herbert Mühlburger 2012-04-04, 18:24
Hi Eli, Am 04.04.12 20:18, schrieb Eli Finkelshteyn: > I'm currently trying to do something I figured would be trivial, but > actually wound up being a bit of work for me, so I'm wondering if I'm > missing something. All I want to do is get a cross product of two > tuples. So for example, given an input of: > > ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > > I'd get: > > ('hello', 'hola') > ('hello', 'bonjour') > ('howdy', 'hola') > ('howdy', 'bonjour') > ('hi', 'hola') > ('hi', 'bonjour') > > At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's > no good cause the tuples are first themselves put into new tuples. So, > what I'm left with no is writing a dirty and slow python udf for this. > Is there really no better way to do this? I'd think it would be a pretty > standard task. Have you tried CROSS [1] to compute the cross product? [1] https://pig.apache.org/docs/r0.9.2/basic.html#crossRegards, Herbert -- ================================================================Herbert Muehlburger Software Development and Business Management Graz University of Technology www.muehlburger.at www.twitter.com/hmuehlburger ================================================================
-
Re: Cross Product of Two Tuples?
Prashant Kommireddi 2012-04-04, 18:24
http://pig.apache.org/docs/r0.9.1/basic.html#cross-Prashant On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn <[EMAIL PROTECTED]>wrote: > Hi Folks, > I'm currently trying to do something I figured would be trivial, but > actually wound up being a bit of work for me, so I'm wondering if I'm > missing something. All I want to do is get a cross product of two tuples. > So for example, given an input of: > > ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > > I'd get: > > ('hello', 'hola') > ('hello', 'bonjour') > ('howdy', 'hola') > ('howdy', 'bonjour') > ('hi', 'hola') > ('hi', 'bonjour') > > At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no > good cause the tuples are first themselves put into new tuples. So, what > I'm left with no is writing a dirty and slow python udf for this. Is there > really no better way to do this? I'd think it would be a pretty standard > task. > > Eli >
-
Re: Cross Product of Two Tuples?
Eli Finkelshteyn 2012-04-04, 18:40
That's for a relation only. Unless I'm missing something, it does not work for tuples. What I'm doing what require a FOREACH, I'm thinking. Eli On 4/4/12 2:24 PM, Prashant Kommireddi wrote: > http://pig.apache.org/docs/r0.9.1/basic.html#cross> > -Prashant > > On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote: > >> Hi Folks, >> I'm currently trying to do something I figured would be trivial, but >> actually wound up being a bit of work for me, so I'm wondering if I'm >> missing something. All I want to do is get a cross product of two tuples. >> So for example, given an input of: >> >> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >> >> I'd get: >> >> ('hello', 'hola') >> ('hello', 'bonjour') >> ('howdy', 'hola') >> ('howdy', 'bonjour') >> ('hi', 'hola') >> ('hi', 'bonjour') >> >> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no >> good cause the tuples are first themselves put into new tuples. So, what >> I'm left with no is writing a dirty and slow python udf for this. Is there >> really no better way to do this? I'd think it would be a pretty standard >> task. >> >> Eli >>
-
Re: Cross Product of Two Tuples?
Jonathan Coveney 2012-04-04, 18:43
FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross 2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> > That's for a relation only. Unless I'm missing something, it does not work > for tuples. What I'm doing what require a FOREACH, I'm thinking. > > Eli > > > On 4/4/12 2:24 PM, Prashant Kommireddi wrote: > >> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>>> >> -Prashant >> >> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> >> >wrote: >> >> Hi Folks, >>> I'm currently trying to do something I figured would be trivial, but >>> actually wound up being a bit of work for me, so I'm wondering if I'm >>> missing something. All I want to do is get a cross product of two tuples. >>> So for example, given an input of: >>> >>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >>> >>> I'd get: >>> >>> ('hello', 'hola') >>> ('hello', 'bonjour') >>> ('howdy', 'hola') >>> ('howdy', 'bonjour') >>> ('hi', 'hola') >>> ('hi', 'bonjour') >>> >>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no >>> good cause the tuples are first themselves put into new tuples. So, what >>> I'm left with no is writing a dirty and slow python udf for this. Is >>> there >>> really no better way to do this? I'd think it would be a pretty standard >>> task. >>> >>> Eli >>> >>> >
-
Re: Cross Product of Two Tuples?
Eli Finkelshteyn 2012-04-04, 21:37
Nah, doesn't work because it doubles up the tuple, so that: TOBAG(('hello', 'howdy', 'hi')) returns {(('hello', 'howdy', 'hi'))} And so, FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) gets me ('hello', 'howdy', 'hi'), ('hola', 'bonjour') which is just what I started with. Anyway, to solve this problem, what I did was make a quick python udf to make a bag from a tuple without doubling up the tuple, and then ran FLATTEN on that, which looks like: bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)), FLATTEN(py_udfs.tupleToBag(t2)); Where the Python udf I'm using is: @outputSchema("b:bag{}") def tupleToBag(tup): b = [tupify(i) for i in tupify(tup)] return b def tupify(tup): if isinstance(tup, tuple): return tup return (tup,) I'll add that into Python PiggyBank as soon as I get a chance to finish that stuff up. Eli On 4/4/12 2:43 PM, Jonathan Coveney wrote: > FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross > > 2012/4/4 Eli Finkelshteyn<[EMAIL PROTECTED]> > >> That's for a relation only. Unless I'm missing something, it does not work >> for tuples. What I'm doing what require a FOREACH, I'm thinking. >> >> Eli >> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: >> >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>>>> >>> -Prashant >>> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> >>>> wrote: >>> Hi Folks, >>>> I'm currently trying to do something I figured would be trivial, but >>>> actually wound up being a bit of work for me, so I'm wondering if I'm >>>> missing something. All I want to do is get a cross product of two tuples. >>>> So for example, given an input of: >>>> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >>>> >>>> I'd get: >>>> >>>> ('hello', 'hola') >>>> ('hello', 'bonjour') >>>> ('howdy', 'hola') >>>> ('howdy', 'bonjour') >>>> ('hi', 'hola') >>>> ('hi', 'bonjour') >>>> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no >>>> good cause the tuples are first themselves put into new tuples. So, what >>>> I'm left with no is writing a dirty and slow python udf for this. Is >>>> there >>>> really no better way to do this? I'd think it would be a pretty standard >>>> task. >>>> >>>> Eli >>>> >>>>
-
Re: Cross Product of Two Tuples?
Scott Carey 2012-04-05, 17:04
Isn't it FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) or FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1)) ? The inner tuple needs to be unpacked into a list of fields. TOBAG simply puts each element passed in into a bag, and if you pass t1 in there, it will be a bag with only one item. On 4/4/12 11:43 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross > >2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> > >> That's for a relation only. Unless I'm missing something, it does not >>work >> for tuples. What I'm doing what require a FOREACH, I'm thinking. >> >> Eli >> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: >> >>> >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.o>>>rg/docs/r0.9.1/basic.html#cross> >>> >>> -Prashant >>> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli >>>Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> >>> >wrote: >>> >>> Hi Folks, >>>> I'm currently trying to do something I figured would be trivial, but >>>> actually wound up being a bit of work for me, so I'm wondering if I'm >>>> missing something. All I want to do is get a cross product of two >>>>tuples. >>>> So for example, given an input of: >>>> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >>>> >>>> I'd get: >>>> >>>> ('hello', 'hola') >>>> ('hello', 'bonjour') >>>> ('howdy', 'hola') >>>> ('howdy', 'bonjour') >>>> ('hi', 'hola') >>>> ('hi', 'bonjour') >>>> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but >>>>that's no >>>> good cause the tuples are first themselves put into new tuples. So, >>>>what >>>> I'm left with no is writing a dirty and slow python udf for this. Is >>>> there >>>> really no better way to do this? I'd think it would be a pretty >>>>standard >>>> task. >>>> >>>> Eli >>>> >>>> >>
-
Re: Cross Product of Two Tuples?
Jonathan Coveney 2012-04-05, 18:25
Yup, you guys are right...it's alittle annoying, but flatten first, then the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving you a bag is a bug, but this should work in the meanitme. 2012/4/5 Scott Carey <[EMAIL PROTECTED]> > Isn't it > > FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) > or > FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1)) > ? > > The inner tuple needs to be unpacked into a list of fields. TOBAG simply > puts each element passed in into a bag, and if you pass t1 in there, it > will be a bag with only one item. > > On 4/4/12 11:43 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross > > > >2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> > > > >> That's for a relation only. Unless I'm missing something, it does not > >>work > >> for tuples. What I'm doing what require a FOREACH, I'm thinking. > >> > >> Eli > >> > >> > >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: > >> > >>> > >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<> http://pig.apache.o> >>>rg/docs/r0.9.1/basic.html#cross> > >>> > >>> -Prashant > >>> > >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli > >>>Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> > >>> >wrote: > >>> > >>> Hi Folks, > >>>> I'm currently trying to do something I figured would be trivial, but > >>>> actually wound up being a bit of work for me, so I'm wondering if I'm > >>>> missing something. All I want to do is get a cross product of two > >>>>tuples. > >>>> So for example, given an input of: > >>>> > >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > >>>> > >>>> I'd get: > >>>> > >>>> ('hello', 'hola') > >>>> ('hello', 'bonjour') > >>>> ('howdy', 'hola') > >>>> ('howdy', 'bonjour') > >>>> ('hi', 'hola') > >>>> ('hi', 'bonjour') > >>>> > >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but > >>>>that's no > >>>> good cause the tuples are first themselves put into new tuples. So, > >>>>what > >>>> I'm left with no is writing a dirty and slow python udf for this. Is > >>>> there > >>>> really no better way to do this? I'd think it would be a pretty > >>>>standard > >>>> task. > >>>> > >>>> Eli > >>>> > >>>> > >> > >
-
Re: Cross Product of Two Tuples?
Scott Carey 2012-04-05, 20:35
On 4/5/12 11:25 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >Yup, you guys are right...it's alittle annoying, but flatten first, then >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving you >a bag is a bug, but this should work in the meanitme. I can't see how that could be a bug. What if you want to create a bag with one tuple in it? > >2012/4/5 Scott Carey <[EMAIL PROTECTED]> > >> Isn't it >> >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) >> or >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1)) >> ? >> >> The inner tuple needs to be unpacked into a list of fields. TOBAG >>simply >> puts each element passed in into a bag, and if you pass t1 in there, it >> will be a bag with only one item. >> >> On 4/4/12 11:43 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross >> > >> >2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> >> > >> >> That's for a relation only. Unless I'm missing something, it does not >> >>work >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking. >> >> >> >> Eli >> >> >> >> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: >> >> >> >>> >> >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<>> http://pig.apache.o>> >>>rg/docs/r0.9.1/basic.html#cross> >> >>> >> >>> -Prashant >> >>> >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli >> >>>Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> >> >>> >wrote: >> >>> >> >>> Hi Folks, >> >>>> I'm currently trying to do something I figured would be trivial, >>but >> >>>> actually wound up being a bit of work for me, so I'm wondering if >>I'm >> >>>> missing something. All I want to do is get a cross product of two >> >>>>tuples. >> >>>> So for example, given an input of: >> >>>> >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >> >>>> >> >>>> I'd get: >> >>>> >> >>>> ('hello', 'hola') >> >>>> ('hello', 'bonjour') >> >>>> ('howdy', 'hola') >> >>>> ('howdy', 'bonjour') >> >>>> ('hi', 'hola') >> >>>> ('hi', 'bonjour') >> >>>> >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but >> >>>>that's no >> >>>> good cause the tuples are first themselves put into new tuples. So, >> >>>>what >> >>>> I'm left with no is writing a dirty and slow python udf for this. >>Is >> >>>> there >> >>>> really no better way to do this? I'd think it would be a pretty >> >>>>standard >> >>>> task. >> >>>> >> >>>> Eli >> >>>> >> >>>> >> >> >> >>
-
Re: Cross Product of Two Tuples?
Jonathan Coveney 2012-04-05, 23:41
Well, perhaps bug is a heavy handed word. A poor user experience might be better. I would posit that TOBAG(tuple) 9 times out of ten means "make each column a row" instead of "give me a bag with a tuple of a tuple." But I'd love opinions on the matter. 2012/4/5 Scott Carey <[EMAIL PROTECTED]> > On 4/5/12 11:25 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > >Yup, you guys are right...it's alittle annoying, but flatten first, then > >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving you > >a bag is a bug, but this should work in the meanitme. > > I can't see how that could be a bug. What if you want to create a bag > with one tuple in it? > > > > > >2012/4/5 Scott Carey <[EMAIL PROTECTED]> > > > >> Isn't it > >> > >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) > >> or > >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, t2::$1)) > >> ? > >> > >> The inner tuple needs to be unpacked into a list of fields. TOBAG > >>simply > >> puts each element passed in into a bag, and if you pass t1 in there, it > >> will be a bag with only one item. > >> > >> On 4/4/12 11:43 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >> > >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross > >> > > >> >2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> > >> > > >> >> That's for a relation only. Unless I'm missing something, it does not > >> >>work > >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking. > >> >> > >> >> Eli > >> >> > >> >> > >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: > >> >> > >> >>> > >> >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<> >> http://pig.apache.o> >> >>>rg/docs/r0.9.1/basic.html#cross> > >> >>> > >> >>> -Prashant > >> >>> > >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli > >> >>>Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> > >> >>> >wrote: > >> >>> > >> >>> Hi Folks, > >> >>>> I'm currently trying to do something I figured would be trivial, > >>but > >> >>>> actually wound up being a bit of work for me, so I'm wondering if > >>I'm > >> >>>> missing something. All I want to do is get a cross product of two > >> >>>>tuples. > >> >>>> So for example, given an input of: > >> >>>> > >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > >> >>>> > >> >>>> I'd get: > >> >>>> > >> >>>> ('hello', 'hola') > >> >>>> ('hello', 'bonjour') > >> >>>> ('howdy', 'hola') > >> >>>> ('howdy', 'bonjour') > >> >>>> ('hi', 'hola') > >> >>>> ('hi', 'bonjour') > >> >>>> > >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but > >> >>>>that's no > >> >>>> good cause the tuples are first themselves put into new tuples. So, > >> >>>>what > >> >>>> I'm left with no is writing a dirty and slow python udf for this. > >>Is > >> >>>> there > >> >>>> really no better way to do this? I'd think it would be a pretty > >> >>>>standard > >> >>>> task. > >> >>>> > >> >>>> Eli > >> >>>> > >> >>>> > >> >> > >> > >> > >
-
Re: Cross Product of Two Tuples?
Scott Carey 2012-04-06, 01:23
The documentation is extremely clear: /** * This class takes a list of items and puts them into a bag * T = foreach U generate TOBAG($0, $1, $2); * It's like saying this: * T = foreach U generate {($0), ($1), ($2)} */ Adding conditionals to that seems complicating the issue and would introduce bugs. What happens with TOBAG(tuple1, tuple2)? What happens when TOBAG($0) changes type? What if its type is different across rows? Each operator should do one simple operation consistently, and not depend on the type passed in. Its frustrating enough that FLATTEN does two things. IMO there should be one operator that explodes bags, and one that unpacks tuples, not one conflated operator that does both -- I have had to debug several issues as a result of this or a misunderstanding from new pig users. Making TOBAG do one thing for one type of data and something else for others does not make pig scripts maintainable or intuitive to follow IMO. On 4/5/12 4:41 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >Well, perhaps bug is a heavy handed word. A poor user experience might be >better. I would posit that TOBAG(tuple) 9 times out of ten means "make >each >column a row" instead of "give me a bag with a tuple of a tuple." But I'd >love opinions on the matter. > >2012/4/5 Scott Carey <[EMAIL PROTECTED]> > >> On 4/5/12 11:25 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> >> >Yup, you guys are right...it's alittle annoying, but flatten first, >>then >> >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving >>you >> >a bag is a bug, but this should work in the meanitme. >> >> I can't see how that could be a bug. What if you want to create a bag >> with one tuple in it? >> >> >> > >> >2012/4/5 Scott Carey <[EMAIL PROTECTED]> >> > >> >> Isn't it >> >> >> >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) >> >> or >> >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, >>t2::$1)) >> >> ? >> >> >> >> The inner tuple needs to be unpacked into a list of fields. TOBAG >> >>simply >> >> puts each element passed in into a bag, and if you pass t1 in there, >>it >> >> will be a bag with only one item. >> >> >> >> On 4/4/12 11:43 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: >> >> >> >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross >> >> > >> >> >2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> >> >> > >> >> >> That's for a relation only. Unless I'm missing something, it does >>not >> >> >>work >> >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking. >> >> >> >> >> >> Eli >> >> >> >> >> >> >> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: >> >> >> >> >> >>> >> >> >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<>> >> http://pig.apache.o>> >> >>>rg/docs/r0.9.1/basic.html#cross> >> >> >>> >> >> >>> -Prashant >> >> >>> >> >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli >> >> >>>Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> >> >> >>> >wrote: >> >> >>> >> >> >>> Hi Folks, >> >> >>>> I'm currently trying to do something I figured would be trivial, >> >>but >> >> >>>> actually wound up being a bit of work for me, so I'm wondering >>if >> >>I'm >> >> >>>> missing something. All I want to do is get a cross product of >>two >> >> >>>>tuples. >> >> >>>> So for example, given an input of: >> >> >>>> >> >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >> >> >>>> >> >> >>>> I'd get: >> >> >>>> >> >> >>>> ('hello', 'hola') >> >> >>>> ('hello', 'bonjour') >> >> >>>> ('howdy', 'hola') >> >> >>>> ('howdy', 'bonjour') >> >> >>>> ('hi', 'hola') >> >> >>>> ('hi', 'bonjour') >> >> >>>> >> >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but >> >> >>>>that's no >> >> >>>> good cause the tuples are first themselves put into new tuples. >>So, >> >> >>>>what >> >> >>>> I'm left with no is writing a dirty and slow python udf for >>this. >> >>Is >> >> >>>> there >> >> >>>> really no better way to do this? I'd think it would be a pretty
-
Re: Cross Product of Two Tuples?
Jonathan Coveney 2012-04-06, 06:45
A totally valid point. You swayed me :) 2012/4/5 Scott Carey <[EMAIL PROTECTED]> > The documentation is extremely clear: > > /** > * This class takes a list of items and puts them into a bag > * T = foreach U generate TOBAG($0, $1, $2); > * It's like saying this: > * T = foreach U generate {($0), ($1), ($2)} > */ > > > Adding conditionals to that seems complicating the issue and would > introduce bugs. > > What happens with TOBAG(tuple1, tuple2)? > What happens when TOBAG($0) changes type? What if its type is different > across rows? > > Each operator should do one simple operation consistently, and not depend > on the type passed in. > Its frustrating enough that FLATTEN does two things. IMO there should be > one operator that explodes bags, and one that unpacks tuples, not one > conflated operator that does both -- I have had to debug several issues as > a result of this or a misunderstanding from new pig users. Making TOBAG do > one thing for one type of data and something else for others does not make > pig scripts maintainable or intuitive to follow IMO. > > On 4/5/12 4:41 PM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > >Well, perhaps bug is a heavy handed word. A poor user experience might be > >better. I would posit that TOBAG(tuple) 9 times out of ten means "make > >each > >column a row" instead of "give me a bag with a tuple of a tuple." But I'd > >love opinions on the matter. > > > >2012/4/5 Scott Carey <[EMAIL PROTECTED]> > > > >> On 4/5/12 11:25 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >> > >> >Yup, you guys are right...it's alittle annoying, but flatten first, > >>then > >> >the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving > >>you > >> >a bag is a bug, but this should work in the meanitme. > >> > >> I can't see how that could be a bug. What if you want to create a bag > >> with one tuple in it? > >> > >> > >> > > >> >2012/4/5 Scott Carey <[EMAIL PROTECTED]> > >> > > >> >> Isn't it > >> >> > >> >> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) > >> >> or > >> >> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, > >>t2::$1)) > >> >> ? > >> >> > >> >> The inner tuple needs to be unpacked into a list of fields. TOBAG > >> >>simply > >> >> puts each element passed in into a bag, and if you pass t1 in there, > >>it > >> >> will be a bag with only one item. > >> >> > >> >> On 4/4/12 11:43 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > >> >> > >> >> >FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross > >> >> > > >> >> >2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]> > >> >> > > >> >> >> That's for a relation only. Unless I'm missing something, it does > >>not > >> >> >>work > >> >> >> for tuples. What I'm doing what require a FOREACH, I'm thinking. > >> >> >> > >> >> >> Eli > >> >> >> > >> >> >> > >> >> >> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: > >> >> >> > >> >> >>> > >> >> >>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<> >> >> http://pig.apache.o> >> >> >>>rg/docs/r0.9.1/basic.html#cross> > >> >> >>> > >> >> >>> -Prashant > >> >> >>> > >> >> >>> On Wed, Apr 4, 2012 at 11:18 AM, Eli > >> >> >>>Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> > >> >> >>> >wrote: > >> >> >>> > >> >> >>> Hi Folks, > >> >> >>>> I'm currently trying to do something I figured would be trivial, > >> >>but > >> >> >>>> actually wound up being a bit of work for me, so I'm wondering > >>if > >> >>I'm > >> >> >>>> missing something. All I want to do is get a cross product of > >>two > >> >> >>>>tuples. > >> >> >>>> So for example, given an input of: > >> >> >>>> > >> >> >>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > >> >> >>>> > >> >> >>>> I'd get: > >> >> >>>> > >> >> >>>> ('hello', 'hola') > >> >> >>>> ('hello', 'bonjour') > >> >> >>>> ('howdy', 'hola') > >> >> >>>> ('howdy', 'bonjour') > >> >> >>>> ('hi', 'hola') > >> >> >>>> ('hi', 'bonjour') > >> >> >>>> > >> >> >>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
-
Re: Cross Product of Two Tuples?
Eli Finkelshteyn 2012-04-07, 23:27
Very much agree. Had that been the case, This would have been a far less confusing exercise. At least I feel like I have a better grasp on when Flatten does what now, anyway. On 4/5/12 8:23 PM, Scott Carey wrote: > The documentation is extremely clear: > > /** > * This class takes a list of items and puts them into a bag > * T = foreach U generate TOBAG($0, $1, $2); > * It's like saying this: > * T = foreach U generate {($0), ($1), ($2)} > */ > > > Adding conditionals to that seems complicating the issue and would > introduce bugs. > > What happens with TOBAG(tuple1, tuple2)? > What happens when TOBAG($0) changes type? What if its type is different > across rows? > > Each operator should do one simple operation consistently, and not depend > on the type passed in. > Its frustrating enough that FLATTEN does two things. IMO there should be > one operator that explodes bags, and one that unpacks tuples, not one > conflated operator that does both -- I have had to debug several issues as > a result of this or a misunderstanding from new pig users. Making TOBAG do > one thing for one type of data and something else for others does not make > pig scripts maintainable or intuitive to follow IMO. > > On 4/5/12 4:41 PM, "Jonathan Coveney"<[EMAIL PROTECTED]> wrote: > >> Well, perhaps bug is a heavy handed word. A poor user experience might be >> better. I would posit that TOBAG(tuple) 9 times out of ten means "make >> each >> column a row" instead of "give me a bag with a tuple of a tuple." But I'd >> love opinions on the matter. >> >> 2012/4/5 Scott Carey<[EMAIL PROTECTED]> >> >>> On 4/5/12 11:25 AM, "Jonathan Coveney"<[EMAIL PROTECTED]> wrote: >>> >>>> Yup, you guys are right...it's alittle annoying, but flatten first, >>> then >>>> the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving >>> you >>>> a bag is a bug, but this should work in the meanitme. >>> I can't see how that could be a bug. What if you want to create a bag >>> with one tuple in it? >>> >>> >>>> 2012/4/5 Scott Carey<[EMAIL PROTECTED]> >>>> >>>>> Isn't it >>>>> >>>>> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2))) >>>>> or >>>>> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0, >>> t2::$1)) >>>>> ? >>>>> >>>>> The inner tuple needs to be unpacked into a list of fields. TOBAG >>>>> simply >>>>> puts each element passed in into a bag, and if you pass t1 in there, >>> it >>>>> will be a bag with only one item. >>>>> >>>>> On 4/4/12 11:43 AM, "Jonathan Coveney"<[EMAIL PROTECTED]> wrote: >>>>> >>>>>> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross >>>>>> >>>>>> 2012/4/4 Eli Finkelshteyn<[EMAIL PROTECTED]> >>>>>> >>>>>>> That's for a relation only. Unless I'm missing something, it does >>> not >>>>>>> work >>>>>>> for tuples. What I'm doing what require a FOREACH, I'm thinking. >>>>>>> >>>>>>> Eli >>>>>>> >>>>>>> >>>>>>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: >>>>>>> >>>>>>>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<>>>>> http://pig.apache.o>>>>>>>> rg/docs/r0.9.1/basic.html#cross> >>>>>>>> >>>>>>>> -Prashant >>>>>>>> >>>>>>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli >>>>>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]> >>>>>>>>> wrote: >>>>>>>> Hi Folks, >>>>>>>>> I'm currently trying to do something I figured would be trivial, >>>>> but >>>>>>>>> actually wound up being a bit of work for me, so I'm wondering >>> if >>>>> I'm >>>>>>>>> missing something. All I want to do is get a cross product of >>> two >>>>>>>>> tuples. >>>>>>>>> So for example, given an input of: >>>>>>>>> >>>>>>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >>>>>>>>> >>>>>>>>> I'd get: >>>>>>>>> >>>>>>>>> ('hello', 'hola') >>>>>>>>> ('hello', 'bonjour') >>>>>>>>> ('howdy', 'hola') >>>>>>>>> ('howdy', 'bonjour') >>>>>>>>> ('hi', 'hola') >>>>>>>>> ('hi', 'bonjour') >>>>>>>>> >>>>>>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but >>>>>>>>> that's no >>>>>>>>> good cause the tuples are first themselves put into new tuples.
|
|