Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: Cross Product of Two Tuples?


+
Jonathan Coveney 2012-04-04, 23:36
Copy link to this message
-
Re: Cross Product of Two Tuples?
I would say the additional nesting level is a bug.
But we should check if we break stuff with this change.

Cheers,
--
Gianmarco

On Thu, Apr 5, 2012 at 01:36, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> Pig folks: it seems like it defies the expectation if TOBAG is run on a
> single TUPLE and you don't get a bag. I can patch it, but seem like a fair
> change?
>
> 2012/4/4 Eli Finkelshteyn <[EMAIL PROTECTED]>
>
> > Nah, doesn't work because it doubles up the tuple, so that:
> >
> > TOBAG(('hello', 'howdy', 'hi'))
> > returns
> > {(('hello', 'howdy', 'hi'))}
> >
> > And so,
> >
> > FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2))
> > gets me
> >
> > ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >
> > which is just what I started with.
> >
> > Anyway, to solve this problem, what I did was make a quick python udf to
> > make a bag from a tuple without doubling up the tuple, and then ran
> FLATTEN
> > on that, which looks like:
> >
> > bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**),
> > FLATTEN(py_udfs.tupleToBag(t2)**);
> >
> > Where the Python udf I'm using is:
> >
> > @outputSchema("b:bag{}")
> > def tupleToBag(tup):
> >    b = [tupify(i) for i in tupify(tup)]
> >    return b
> >
> > def tupify(tup):
> >    if isinstance(tup, tuple):
> >        return tup
> >    return (tup,)
> >
> > I'll add that into Python PiggyBank as soon as I get a chance to finish
> > that stuff up.
> >
> > Eli
> >
> >
> >
> > On 4/4/12 2:43 PM, Jonathan Coveney wrote:
> >
> >> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
> >>
> >> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <[EMAIL PROTECTED]>>
> >>
> >>  That's for a relation only. Unless I'm missing something, it does not
> >>> work
> >>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
> >>>
> >>> Eli
> >>>
> >>>
> >>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
> >>>
> >>>  http://pig.apache.org/docs/r0.****9.1/basic.html#cross<
> http://pig.apache.org/docs/r0.**9.1/basic.html#cross>
> >>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross<
> http://pig.apache.org/docs/r0.9.1/basic.html#cross>
> >>>> >
> >>>>
> >>>> -Prashant
> >>>>
> >>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.****
> >>>> com<[EMAIL PROTECTED]>
> >>>>
> >>>>  wrote:
> >>>>>
> >>>>  Hi Folks,
> >>>>
> >>>>> I'm currently trying to do something I figured would be trivial, but
> >>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
> >>>>> missing something. All I want to do is get a cross product of two
> >>>>> tuples.
> >>>>> So for example, given an input of:
> >>>>>
> >>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
> >>>>>
> >>>>> I'd get:
> >>>>>
> >>>>> ('hello', 'hola')
> >>>>> ('hello', 'bonjour')
> >>>>> ('howdy', 'hola')
> >>>>> ('howdy', 'bonjour')
> >>>>> ('hi', 'hola')
> >>>>> ('hi', 'bonjour')
> >>>>>
> >>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but
> that's
> >>>>> no
> >>>>> good cause the tuples are first themselves put into new tuples. So,
> >>>>> what
> >>>>> I'm left with no is writing a dirty and slow python udf for this. Is
> >>>>> there
> >>>>> really no better way to do this? I'd think it would be a pretty
> >>>>> standard
> >>>>> task.
> >>>>>
> >>>>> Eli
> >>>>>
> >>>>>
> >>>>>
> >
>
+
Scott Carey 2012-04-05, 16:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB