Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Access only data from LEFT OUTER JOIN side of joined data without projection prefix


+
Florian Zumkeller-Quast 2012-07-25, 09:48
+
Alan Gates 2012-07-25, 14:22
+
Florian Zumkeller-Quast 2012-07-26, 15:55
+
Florian Zumkeller-Quast 2012-07-27, 15:34
+
Alex Rovner 2012-07-26, 13:54
+
Alan Gates 2012-07-26, 15:01
Copy link to this message
-
Re: Access only data from LEFT OUTER JOIN side of joined data without projection prefix
I transform my schemas to not have the Avro invalid character, ':' in
them, before I store. For example:

>>> D: {A::id: long,A::time: int,B::id: long,B::time: int}
D = foreach D generate A::id as a_id, A::time as a_time, B::id as
b_id, B::time as b_time;

You might try creating tuples for A and B, then you could access the
field names as A.id, A.time, B.id, B.time. For example:

D = foreach D generate ToTuple(A::id, A::time) as A, ToTuple(B::id,
B::time) as B;

That will store too, and should be scriptable with a macro?

Russell Jurney http://datasyndrome.com

On Jul 26, 2012, at 8:01 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> How will you handle ambiguities when there is an A::b and B::b?
>
> Alan.
>
> On Jul 26, 2012, at 6:54 AM, Alex Rovner wrote:
>
>> I am proposing to patch avrostorage to have an option of storing field names without their relation name. A::b will be saved as "b".
>>
>> Thoughts?
>>
>> Sent from my iPhone
>>
>> On Jul 25, 2012, at 5:48 AM, "Florian Zumkeller-Quast" <[EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>> I got the following code:
>>>
>>> A = LOAD '§file1' USING AvroStorage();
>>> B = LOAD '$file2' USING AvroStorage();
>>> C = JOIN A BY id LEFT OUTER, B BY id;
>>> SPLIT C INTO D IF B::id IS NULL, E OTHERWISE;
>>>
>>> DESCRIBE shows the following data structure
>>>
>>> D: {A::id: long,A::time: int,B::id: long,B::time: int}
>>> E: {A::id: long,A::time: int,B::id: long,B::time: int}
>>>
>>> But i can't store D and E using AvroStorage because the filed names contain
>>> "::" which is not an allowed character.
>>>
>>> I need  structure like
>>> F: {id: long,time: int}
>>> where id = E::A::id and time = E::A::time.
>>>
>>> The problem is: The number, name and type of fields may vary.
>>>
>>> So E might looks like
>>> E: {A::id: long,A::time: int,A::fieldN1,B::id: long,B::time: int,B::fieldN1 int}
>>>
>>> Thus I can't use
>>>
>>> F = FOREACH … GENERATE …;
>>>
>>> because i don't want to write code for each filetype as long as I don't really
>>> need to.
>>>
>>> Can someone give me an advice how to get the result I need?
>>>
>>> Thanks!
>>>
>>> With kind regards
>>> Florian Zumkeller-Quast
>>> --
>>> Developer
>>> ________________________________________________________
>>>
>>> ADITION technologies AG
>>> Schwarzwaldstraße 78b
>>> 79117 Freiburg
>>>
>>> http://www.adition.com
>>>
>>> T +49 / (0)761 / 88147 - 30
>>> F +49 / (0)761 / 88147 - 77
>>> SUPPORT +49  / (0)1805 - ADITION
>>>
>>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>>
>>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>>> UStIDNr.: DE 218 858 434
>
+
Alex Rovner 2012-07-27, 14:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB