Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Access only data from LEFT OUTER JOIN side of joined data without projection prefix


Copy link to this message
-
Re: Access only data from LEFT OUTER JOIN side of joined data without projection prefix
Alan Gates 2012-07-25, 14:22
Basically you need to transform the schema, not the data.  The easiest way I can think of to do that is to use a UDF that has an outputSchema function that renames columns.  The exec call can then be a simple pass through.  

If you wanted to you could have it consolidate the join keys.  You imply you would like to consolidate other columns as well (A::E::time in your example), but that is not valid.  Since time is not a join key it will not necessarily be the same in A and E.

Alan.

On Jul 25, 2012, at 2:48 AM, Florian Zumkeller-Quast wrote:

> Hello,
> I got the following code:
>
> A = LOAD '§file1' USING AvroStorage();
> B = LOAD '$file2' USING AvroStorage();
> C = JOIN A BY id LEFT OUTER, B BY id;
> SPLIT C INTO D IF B::id IS NULL, E OTHERWISE;
>
> DESCRIBE shows the following data structure
>
> D: {A::id: long,A::time: int,B::id: long,B::time: int}
> E: {A::id: long,A::time: int,B::id: long,B::time: int}
>
> But i can't store D and E using AvroStorage because the filed names contain
> "::" which is not an allowed character.
>
> I need  structure like
> F: {id: long,time: int}
> where id = E::A::id and time = E::A::time.
>
> The problem is: The number, name and type of fields may vary.
>
> So E might looks like
> E: {A::id: long,A::time: int,A::fieldN1,B::id: long,B::time: int,B::fieldN1 int}
>
> Thus I can't use
>
> F = FOREACH … GENERATE …;
>
> because i don't want to write code for each filetype as long as I don't really
> need to.
>
> Can someone give me an advice how to get the result I need?
>
> Thanks!
>
> With kind regards
> Florian Zumkeller-Quast
> --
> Developer
> ________________________________________________________
>
> ADITION technologies AG
> Schwarzwaldstraße 78b
> 79117 Freiburg
>
> http://www.adition.com
>
> T +49 / (0)761 / 88147 - 30
> F +49 / (0)761 / 88147 - 77
> SUPPORT +49  / (0)1805 - ADITION
>
> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>
> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> UStIDNr.: DE 218 858 434