Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Cross Product of Two Tuples?


Copy link to this message
-
Re: Cross Product of Two Tuples?
Very much agree. Had that been the case, This would have been a far less
confusing exercise. At least I feel like I have a better grasp on when
Flatten does what now, anyway.

On 4/5/12 8:23 PM, Scott Carey wrote:
> The documentation is extremely clear:
>
> /**
>   * This class takes a list of items and puts them into a bag
>   * T = foreach U generate TOBAG($0, $1, $2);
>   * It's like saying this:
>   * T = foreach U generate {($0), ($1), ($2)}
>   */
>
>
> Adding conditionals to that seems complicating the issue and would
> introduce bugs.
>
> What happens with TOBAG(tuple1, tuple2)?
> What happens when TOBAG($0) changes type?  What if its type is different
> across rows?
>
> Each operator should do one simple operation consistently, and not depend
> on the type passed in.
> Its frustrating enough that FLATTEN does two things.  IMO there should be
> one operator that explodes bags, and one that unpacks tuples, not one
> conflated operator that does both -- I have had to debug several issues as
> a result of this or a misunderstanding from new pig users. Making TOBAG do
> one thing for one type of data and something else for others does not make
> pig scripts maintainable or intuitive to follow IMO.
>
> On 4/5/12 4:41 PM, "Jonathan Coveney"<[EMAIL PROTECTED]>  wrote:
>
>> Well, perhaps bug is a heavy handed word. A poor user experience might be
>> better. I would posit that TOBAG(tuple) 9 times out of ten means "make
>> each
>> column a row" instead of "give me a bag with a tuple of a tuple." But I'd
>> love opinions on the matter.
>>
>> 2012/4/5 Scott Carey<[EMAIL PROTECTED]>
>>
>>> On 4/5/12 11:25 AM, "Jonathan Coveney"<[EMAIL PROTECTED]>  wrote:
>>>
>>>> Yup, you guys are right...it's alittle annoying, but flatten first,
>>> then
>>>> the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving
>>> you
>>>> a bag is a bug, but this should work in the meanitme.
>>> I can't see how that could be a bug.  What if you want to create a bag
>>> with one tuple in it?
>>>
>>>
>>>> 2012/4/5 Scott Carey<[EMAIL PROTECTED]>
>>>>
>>>>> Isn't it
>>>>>
>>>>> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
>>>>> or
>>>>> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0,
>>> t2::$1))
>>>>> ?
>>>>>
>>>>> The inner tuple needs to be unpacked into a list of fields.  TOBAG
>>>>> simply
>>>>> puts each element passed in into a bag, and if you pass t1 in there,
>>> it
>>>>> will be a bag with only one item.
>>>>>
>>>>> On 4/4/12 11:43 AM, "Jonathan Coveney"<[EMAIL PROTECTED]>  wrote:
>>>>>
>>>>>> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>>>>>>
>>>>>> 2012/4/4 Eli Finkelshteyn<[EMAIL PROTECTED]>
>>>>>>
>>>>>>> That's for a relation only. Unless I'm missing something, it does
>>> not
>>>>>>> work
>>>>>>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>>>>>>
>>>>>>>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
>>>>> http://pig.apache.o
>>>>>>>> rg/docs/r0.9.1/basic.html#cross>
>>>>>>>>
>>>>>>>> -Prashant
>>>>>>>>
>>>>>>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>>>>>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>>>>>>> wrote:
>>>>>>>>   Hi Folks,
>>>>>>>>> I'm currently trying to do something I figured would be trivial,
>>>>> but
>>>>>>>>> actually wound up being a bit of work for me, so I'm wondering
>>> if
>>>>> I'm
>>>>>>>>> missing something. All I want to do is get a cross product of
>>> two
>>>>>>>>> tuples.
>>>>>>>>> So for example, given an input of:
>>>>>>>>>
>>>>>>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>>>>>>
>>>>>>>>> I'd get:
>>>>>>>>>
>>>>>>>>> ('hello', 'hola')
>>>>>>>>> ('hello', 'bonjour')
>>>>>>>>> ('howdy', 'hola')
>>>>>>>>> ('howdy', 'bonjour')
>>>>>>>>> ('hi', 'hola')
>>>>>>>>> ('hi', 'bonjour')
>>>>>>>>>
>>>>>>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>>>>>>>>> that's no
>>>>>>>>> good cause the tuples are first themselves put into new tuples.