Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Access only data from LEFT OUTER JOIN side of joined data without projection prefix


Copy link to this message
-
Re: Access only data from LEFT OUTER JOIN side of joined data without projection prefix
I transform my schemas to not have the Avro invalid character, ':' in
them, before I store. For example:

>>> D: {A::id: long,A::time: int,B::id: long,B::time: int}
D = foreach D generate A::id as a_id, A::time as a_time, B::id as
b_id, B::time as b_time;

You might try creating tuples for A and B, then you could access the
field names as A.id, A.time, B.id, B.time. For example:

D = foreach D generate ToTuple(A::id, A::time) as A, ToTuple(B::id,
B::time) as B;

That will store too, and should be scriptable with a macro?

Russell Jurney http://datasyndrome.com

On Jul 26, 2012, at 8:01 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> How will you handle ambiguities when there is an A::b and B::b?
>
> Alan.
>
> On Jul 26, 2012, at 6:54 AM, Alex Rovner wrote:
>
>> I am proposing to patch avrostorage to have an option of storing field names without their relation name. A::b will be saved as "b".
>>
>> Thoughts?
>>
>> Sent from my iPhone
>>
>> On Jul 25, 2012, at 5:48 AM, "Florian Zumkeller-Quast" <[EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>> I got the following code:
>>>
>>> A = LOAD '§file1' USING AvroStorage();
>>> B = LOAD '$file2' USING AvroStorage();
>>> C = JOIN A BY id LEFT OUTER, B BY id;
>>> SPLIT C INTO D IF B::id IS NULL, E OTHERWISE;
>>>
>>> DESCRIBE shows the following data structure
>>>
>>> D: {A::id: long,A::time: int,B::id: long,B::time: int}
>>> E: {A::id: long,A::time: int,B::id: long,B::time: int}
>>>
>>> But i can't store D and E using AvroStorage because the filed names contain
>>> "::" which is not an allowed character.
>>>
>>> I need  structure like
>>> F: {id: long,time: int}
>>> where id = E::A::id and time = E::A::time.
>>>
>>> The problem is: The number, name and type of fields may vary.
>>>
>>> So E might looks like
>>> E: {A::id: long,A::time: int,A::fieldN1,B::id: long,B::time: int,B::fieldN1 int}
>>>
>>> Thus I can't use
>>>
>>> F = FOREACH … GENERATE …;
>>>
>>> because i don't want to write code for each filetype as long as I don't really
>>> need to.
>>>
>>> Can someone give me an advice how to get the result I need?
>>>
>>> Thanks!
>>>
>>> With kind regards
>>> Florian Zumkeller-Quast
>>> --
>>> Developer
>>> ________________________________________________________
>>>
>>> ADITION technologies AG
>>> Schwarzwaldstraße 78b
>>> 79117 Freiburg
>>>
>>> http://www.adition.com
>>>
>>> T +49 / (0)761 / 88147 - 30
>>> F +49 / (0)761 / 88147 - 77
>>> SUPPORT +49  / (0)1805 - ADITION
>>>
>>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>>
>>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>>> UStIDNr.: DE 218 858 434
>