Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - add a field, ordered


+
Lauren Blau 2012-08-14, 09:55
+
Gianmarco De Francisci Mo... 2012-08-14, 10:05
+
Lauren Blau 2012-08-14, 10:38
Copy link to this message
-
Re: add a field, ordered
Alan Gates 2012-08-23, 20:43
Take a look at https://issues.apache.org/jira/browse/PIG-2353  I believe that's the JIRA for where they're doing the work.

Alan.

On Aug 14, 2012, at 3:38 AM, Lauren Blau wrote:

> Is the source for it available in the development area? I'd be happy to
> help if I can.
> Lauren
>
> On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> We are finalizing a feature that would solve your problems, something like
>> ROW_NUMBER in some SQL dialect, we call it RANK.
>> This operator will add a unique consecutive row number to each tuple in the
>> relationship.
>> Then you will be able to join the two relationships on the rank field.
>>
>> For the moment being, however, I think there is no easy way to achieve what
>> you want to do.
>>
>> Cheers,
>> --
>> Gianmarco
>>
>>
>>
>> On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
>> [EMAIL PROTECTED]> wrote:
>>
>>> I  want to match up tuples from 2 relations. For each key, the 2
>> relations
>>> will always have the same number of tuples and match by position (the
>> first
>>> tuple in each are a match, the second tuple in each, etc).
>>>
>>> so if I have
>>> relation1 = 5,9,7
>>> relation2 = z,a,d
>>>
>>> I want to end up with
>>>
>>> relation3 = (5,z),(9,a),(7,d)
>>>
>>> I figure I need a way to generate a matching key on the ordered tuples of
>>> the relations and then do a cogroup. But I'm stuck on generating the key.
>>> Since adding a field is a project, I assume this has to be done as part
>> of
>>> a foreach loop. But I'm not sure how I can maintain the order while
>> adding
>>> a field to each tuple.
>>>
>>> ideas?
>>> Thanks,
>>> lauren
>>>
>>