Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Does pig support in clause?


+
yonghu 2012-06-25, 09:50
+
Alan Gates 2012-06-25, 16:39
+
Russell Jurney 2012-06-25, 17:17
+
Alan Gates 2012-06-25, 17:50
+
Gianmarco De Francisci Mo... 2012-06-26, 05:56
+
Alan Gates 2012-06-26, 16:56
+
Johannes Schwenk 2012-07-04, 12:42
+
Ruslan Al-Fakikh 2012-07-04, 13:53
Copy link to this message
-
Re: Does pig support in clause?
Thank you very much Ruslan! That works well!

Greetings,
Johannes

Am 04.07.2012 15:53, schrieb Ruslan Al-Fakikh:
> Hi Johannes,
>
> Try this
> C = LOAD 'in.dat' AS (A1);
> A = LOAD 'in2.dat' AS (A1);
>
> joined = JOIN A BY A1 LEFT OUTER, C BY A1;
>
> DESCRIBE joined;
>
> newEntries = FILTER joined BY C::A1 IS NULL;
>
> DUMP newEntries;
>
> Ruslan
>
> On Wed, Jul 4, 2012 at 4:42 PM, Johannes Schwenk
> <[EMAIL PROTECTED]> wrote:
>> Hi Alan,
>>
>> I'd like to use this method to not include records in my output that are
>> already present in previously computed data. So I tried to use your
>> suggestion like this:
>>
>> grunt> cat in.dat
>> 1
>> 2
>> 3
>> 4
>> 5
>> 6
>> 7
>> 8
>> 9
>> grunt> C = LOAD 'in.dat' AS (A1); -- previously generated data
>> grunt> cat in2.dat
>> 12
>> 2
>> 13
>> 1
>> 10
>> 9
>> 11
>> 8
>> grunt> A = LOAD 'in2.dat' AS (A1); -- new data
>> grunt> B1 = join A by A1, C by A1;
>> grunt> B2 = filter B1 by SIZE(C) == 0;
>>
>> Which gives me this error:
>>
>> 2012-07-04 14:36:16,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1200: Pig script failed to parse:
>> <line 14, column 23> Invalid scalar projection: C : A column needs to be
>> projected from a relation for it to be used as a scalar
>> Details at logfile: /home/schwenk/pig-0.10.0/pig_1341403702015.log
>>
>> The relevant pig stack trace from the logfile can be found at
>>
>> http://pastebin.com/MxPfduWS
>>
>> What am I doing wrong?
>>
>> Greetings,
>> Johannes
>>
>> Am 25.06.2012 18:39, schrieb Alan Gates:
>>> This type of in is really a semi-join.  So you could rewrite this as:
>>>
>>> B1 = join A by A1, C by A1;
>>> B2 = filter B1 by SIZE(C) > 0;
>>> B = foreach B2 flatten(A);
>>>
>>> Alan.
>>>
>>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>>>
>>>> Dear all,
>>>>
>>>> in the sql, there is a in clause  which is used to check if the value
>>>> is in a set or not? Does pig also have the same in clause? Such as:
>>>>
>>>> B = filter A by A1 in C;
>>>>
>>>> A,B,C are relation names and A1 is a column_name of A.
>>>>
>>>> Thanks!
>>>>
>>>> Yong
>>>
>>
>>
>>
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434

+
Hien Luu 2012-06-25, 14:56