Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Cross Product of Two Tuples?

Copy link to this message
Re: Cross Product of Two Tuples?
Nah, doesn't work because it doubles up the tuple, so that:

TOBAG(('hello', 'howdy', 'hi'))
{(('hello', 'howdy', 'hi'))}

And so,

gets me
('hello', 'howdy', 'hi'), ('hola', 'bonjour')

which is just what I started with.

Anyway, to solve this problem, what I did was make a quick python udf to
make a bag from a tuple without doubling up the tuple, and then ran
FLATTEN on that, which looks like:

bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)),

Where the Python udf I'm using is:

def tupleToBag(tup):
     b = [tupify(i) for i in tupify(tup)]
     return b

def tupify(tup):
     if isinstance(tup, tuple):
         return tup
     return (tup,)

I'll add that into Python PiggyBank as soon as I get a chance to finish
that stuff up.

On 4/4/12 2:43 PM, Jonathan Coveney wrote:
> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
> 2012/4/4 Eli Finkelshteyn<[EMAIL PROTECTED]>
>> That's for a relation only. Unless I'm missing something, it does not work
>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>> Eli
>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>> http://pig.apache.org/docs/r0.**9.1/basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>
>>> -Prashant
>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>> wrote:
>>>   Hi Folks,
>>>> I'm currently trying to do something I figured would be trivial, but
>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
>>>> missing something. All I want to do is get a cross product of two tuples.
>>>> So for example, given an input of:
>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>> I'd get:
>>>> ('hello', 'hola')
>>>> ('hello', 'bonjour')
>>>> ('howdy', 'hola')
>>>> ('howdy', 'bonjour')
>>>> ('hi', 'hola')
>>>> ('hi', 'bonjour')
>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's no
>>>> good cause the tuples are first themselves put into new tuples. So, what
>>>> I'm left with no is writing a dirty and slow python udf for this. Is
>>>> there
>>>> really no better way to do this? I'd think it would be a pretty standard
>>>> task.
>>>> Eli