Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Cross Product of Two Tuples?


+
Eli Finkelshteyn 2012-04-04, 18:18
+
Herbert Mühlburger 2012-04-04, 18:24
+
Prashant Kommireddi 2012-04-04, 18:24
+
Eli Finkelshteyn 2012-04-04, 18:40
+
Jonathan Coveney 2012-04-04, 18:43
+
Eli Finkelshteyn 2012-04-04, 21:37
+
Scott Carey 2012-04-05, 17:04
+
Jonathan Coveney 2012-04-05, 18:25
+
Scott Carey 2012-04-05, 20:35
+
Jonathan Coveney 2012-04-05, 23:41
+
Scott Carey 2012-04-06, 01:23
Copy link to this message
-
Re: Cross Product of Two Tuples?
Very much agree. Had that been the case, This would have been a far less
confusing exercise. At least I feel like I have a better grasp on when
Flatten does what now, anyway.

On 4/5/12 8:23 PM, Scott Carey wrote:
> The documentation is extremely clear:
>
> /**
>   * This class takes a list of items and puts them into a bag
>   * T = foreach U generate TOBAG($0, $1, $2);
>   * It's like saying this:
>   * T = foreach U generate {($0), ($1), ($2)}
>   */
>
>
> Adding conditionals to that seems complicating the issue and would
> introduce bugs.
>
> What happens with TOBAG(tuple1, tuple2)?
> What happens when TOBAG($0) changes type?  What if its type is different
> across rows?
>
> Each operator should do one simple operation consistently, and not depend
> on the type passed in.
> Its frustrating enough that FLATTEN does two things.  IMO there should be
> one operator that explodes bags, and one that unpacks tuples, not one
> conflated operator that does both -- I have had to debug several issues as
> a result of this or a misunderstanding from new pig users. Making TOBAG do
> one thing for one type of data and something else for others does not make
> pig scripts maintainable or intuitive to follow IMO.
>
> On 4/5/12 4:41 PM, "Jonathan Coveney"<[EMAIL PROTECTED]>  wrote:
>
>> Well, perhaps bug is a heavy handed word. A poor user experience might be
>> better. I would posit that TOBAG(tuple) 9 times out of ten means "make
>> each
>> column a row" instead of "give me a bag with a tuple of a tuple." But I'd
>> love opinions on the matter.
>>
>> 2012/4/5 Scott Carey<[EMAIL PROTECTED]>
>>
>>> On 4/5/12 11:25 AM, "Jonathan Coveney"<[EMAIL PROTECTED]>  wrote:
>>>
>>>> Yup, you guys are right...it's alittle annoying, but flatten first,
>>> then
>>>> the two tobags, then the flatten. IMHO the TOBAG of a tuply not giving
>>> you
>>>> a bag is a bug, but this should work in the meanitme.
>>> I can't see how that could be a bug.  What if you want to create a bag
>>> with one tuple in it?
>>>
>>>
>>>> 2012/4/5 Scott Carey<[EMAIL PROTECTED]>
>>>>
>>>>> Isn't it
>>>>>
>>>>> FLATTEN(TOBAG(FLATTEN(t1))), FLATTEN(TOBAG(FLATTEN(t2)))
>>>>> or
>>>>> FLATTEN(TOBAG(t1::$0, t1::$1, t1::$2)), FLATTEN(TOBAG(t2::$0,
>>> t2::$1))
>>>>> ?
>>>>>
>>>>> The inner tuple needs to be unpacked into a list of fields.  TOBAG
>>>>> simply
>>>>> puts each element passed in into a bag, and if you pass t1 in there,
>>> it
>>>>> will be a bag with only one item.
>>>>>
>>>>> On 4/4/12 11:43 AM, "Jonathan Coveney"<[EMAIL PROTECTED]>  wrote:
>>>>>
>>>>>> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>>>>>>
>>>>>> 2012/4/4 Eli Finkelshteyn<[EMAIL PROTECTED]>
>>>>>>
>>>>>>> That's for a relation only. Unless I'm missing something, it does
>>> not
>>>>>>> work
>>>>>>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>>>>>>
>>>>>>>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<
>>>>> http://pig.apache.o
>>>>>>>> rg/docs/r0.9.1/basic.html#cross>
>>>>>>>>
>>>>>>>> -Prashant
>>>>>>>>
>>>>>>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli
>>>>>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>>>>>>> wrote:
>>>>>>>>   Hi Folks,
>>>>>>>>> I'm currently trying to do something I figured would be trivial,
>>>>> but
>>>>>>>>> actually wound up being a bit of work for me, so I'm wondering
>>> if
>>>>> I'm
>>>>>>>>> missing something. All I want to do is get a cross product of
>>> two
>>>>>>>>> tuples.
>>>>>>>>> So for example, given an input of:
>>>>>>>>>
>>>>>>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>>>>>>
>>>>>>>>> I'd get:
>>>>>>>>>
>>>>>>>>> ('hello', 'hola')
>>>>>>>>> ('hello', 'bonjour')
>>>>>>>>> ('howdy', 'hola')
>>>>>>>>> ('howdy', 'bonjour')
>>>>>>>>> ('hi', 'hola')
>>>>>>>>> ('hi', 'bonjour')
>>>>>>>>>
>>>>>>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
>>>>>>>>> that's no
>>>>>>>>> good cause the tuples are first themselves put into new tuples.
+
Jonathan Coveney 2012-04-06, 06:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB