Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Does pig support in clause?


Copy link to this message
-
Re: Does pig support in clause?
Ruslan Al-Fakikh 2012-07-04, 13:53
Hi Johannes,

Try this
C = LOAD 'in.dat' AS (A1);
A = LOAD 'in2.dat' AS (A1);

joined = JOIN A BY A1 LEFT OUTER, C BY A1;

DESCRIBE joined;

newEntries = FILTER joined BY C::A1 IS NULL;

DUMP newEntries;

Ruslan

On Wed, Jul 4, 2012 at 4:42 PM, Johannes Schwenk
<[EMAIL PROTECTED]> wrote:
> Hi Alan,
>
> I'd like to use this method to not include records in my output that are
> already present in previously computed data. So I tried to use your
> suggestion like this:
>
> grunt> cat in.dat
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> grunt> C = LOAD 'in.dat' AS (A1); -- previously generated data
> grunt> cat in2.dat
> 12
> 2
> 13
> 1
> 10
> 9
> 11
> 8
> grunt> A = LOAD 'in2.dat' AS (A1); -- new data
> grunt> B1 = join A by A1, C by A1;
> grunt> B2 = filter B1 by SIZE(C) == 0;
>
> Which gives me this error:
>
> 2012-07-04 14:36:16,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1200: Pig script failed to parse:
> <line 14, column 23> Invalid scalar projection: C : A column needs to be
> projected from a relation for it to be used as a scalar
> Details at logfile: /home/schwenk/pig-0.10.0/pig_1341403702015.log
>
> The relevant pig stack trace from the logfile can be found at
>
> http://pastebin.com/MxPfduWS
>
> What am I doing wrong?
>
> Greetings,
> Johannes
>
> Am 25.06.2012 18:39, schrieb Alan Gates:
>> This type of in is really a semi-join.  So you could rewrite this as:
>>
>> B1 = join A by A1, C by A1;
>> B2 = filter B1 by SIZE(C) > 0;
>> B = foreach B2 flatten(A);
>>
>> Alan.
>>
>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>>
>>> Dear all,
>>>
>>> in the sql, there is a in clause  which is used to check if the value
>>> is in a set or not? Does pig also have the same in clause? Such as:
>>>
>>> B = filter A by A1 in C;
>>>
>>> A,B,C are relation names and A1 is a column_name of A.
>>>
>>> Thanks!
>>>
>>> Yong
>>
>
>
>
> Johannes Schwenk
>
> --
> Softwareentwickler (Reporting)
> ________________________________________________________
>
> ADITION technologies AG
> Schwarzwaldstraße 78b
> 79117 Freiburg
>
> http://www.adition.com
>
> T +49 / (0)761 / 88147 - 30
> F +49 / (0)761 / 88147 - 77
> SUPPORT +49  / (0)1805 - ADITION
>
> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>
> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> UStIDNr.: DE 218 858 434
>
>
>