|
|
Lauren Blau 2012-08-14, 09:55
I want to match up tuples from 2 relations. For each key, the 2 relations will always have the same number of tuples and match by position (the first tuple in each are a match, the second tuple in each, etc).
so if I have relation1 = 5,9,7 relation2 = z,a,d
I want to end up with
relation3 = (5,z),(9,a),(7,d)
I figure I need a way to generate a matching key on the ordered tuples of the relations and then do a cogroup. But I'm stuck on generating the key. Since adding a field is a project, I assume this has to be done as part of a foreach loop. But I'm not sure how I can maintain the order while adding a field to each tuple.
ideas? Thanks, lauren
+
Lauren Blau 2012-08-14, 09:55
-
Re: add a field, ordered
Gianmarco De Francisci Mo... 2012-08-14, 10:05
Hi,
We are finalizing a feature that would solve your problems, something like ROW_NUMBER in some SQL dialect, we call it RANK. This operator will add a unique consecutive row number to each tuple in the relationship. Then you will be able to join the two relationships on the rank field.
For the moment being, however, I think there is no easy way to achieve what you want to do.
Cheers, -- Gianmarco
On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau < [EMAIL PROTECTED]> wrote:
> I want to match up tuples from 2 relations. For each key, the 2 relations > will always have the same number of tuples and match by position (the first > tuple in each are a match, the second tuple in each, etc). > > so if I have > relation1 = 5,9,7 > relation2 = z,a,d > > I want to end up with > > relation3 = (5,z),(9,a),(7,d) > > I figure I need a way to generate a matching key on the ordered tuples of > the relations and then do a cogroup. But I'm stuck on generating the key. > Since adding a field is a project, I assume this has to be done as part of > a foreach loop. But I'm not sure how I can maintain the order while adding > a field to each tuple. > > ideas? > Thanks, > lauren >
+
Gianmarco De Francisci Mo... 2012-08-14, 10:05
-
Re: add a field, ordered
Lauren Blau 2012-08-14, 10:38
Is the source for it available in the development area? I'd be happy to help if I can. Lauren
On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales < [EMAIL PROTECTED]> wrote:
> Hi, > > We are finalizing a feature that would solve your problems, something like > ROW_NUMBER in some SQL dialect, we call it RANK. > This operator will add a unique consecutive row number to each tuple in the > relationship. > Then you will be able to join the two relationships on the rank field. > > For the moment being, however, I think there is no easy way to achieve what > you want to do. > > Cheers, > -- > Gianmarco > > > > On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau < > [EMAIL PROTECTED]> wrote: > > > I want to match up tuples from 2 relations. For each key, the 2 > relations > > will always have the same number of tuples and match by position (the > first > > tuple in each are a match, the second tuple in each, etc). > > > > so if I have > > relation1 = 5,9,7 > > relation2 = z,a,d > > > > I want to end up with > > > > relation3 = (5,z),(9,a),(7,d) > > > > I figure I need a way to generate a matching key on the ordered tuples of > > the relations and then do a cogroup. But I'm stuck on generating the key. > > Since adding a field is a project, I assume this has to be done as part > of > > a foreach loop. But I'm not sure how I can maintain the order while > adding > > a field to each tuple. > > > > ideas? > > Thanks, > > lauren > > >
+
Lauren Blau 2012-08-14, 10:38
-
Re: add a field, ordered
Alan Gates 2012-08-23, 20:43
Take a look at https://issues.apache.org/jira/browse/PIG-2353 I believe that's the JIRA for where they're doing the work. Alan. On Aug 14, 2012, at 3:38 AM, Lauren Blau wrote: > Is the source for it available in the development area? I'd be happy to > help if I can. > Lauren > > On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales < > [EMAIL PROTECTED]> wrote: > >> Hi, >> >> We are finalizing a feature that would solve your problems, something like >> ROW_NUMBER in some SQL dialect, we call it RANK. >> This operator will add a unique consecutive row number to each tuple in the >> relationship. >> Then you will be able to join the two relationships on the rank field. >> >> For the moment being, however, I think there is no easy way to achieve what >> you want to do. >> >> Cheers, >> -- >> Gianmarco >> >> >> >> On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau < >> [EMAIL PROTECTED]> wrote: >> >>> I want to match up tuples from 2 relations. For each key, the 2 >> relations >>> will always have the same number of tuples and match by position (the >> first >>> tuple in each are a match, the second tuple in each, etc). >>> >>> so if I have >>> relation1 = 5,9,7 >>> relation2 = z,a,d >>> >>> I want to end up with >>> >>> relation3 = (5,z),(9,a),(7,d) >>> >>> I figure I need a way to generate a matching key on the ordered tuples of >>> the relations and then do a cogroup. But I'm stuck on generating the key. >>> Since adding a field is a project, I assume this has to be done as part >> of >>> a foreach loop. But I'm not sure how I can maintain the order while >> adding >>> a field to each tuple. >>> >>> ideas? >>> Thanks, >>> lauren >>> >>
+
Alan Gates 2012-08-23, 20:43
|
|