Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Does pig support in clause?


Copy link to this message
-
Re: Does pig support in clause?
Johannes Schwenk 2012-07-04, 12:42
Hi Alan,

I'd like to use this method to not include records in my output that are
already present in previously computed data. So I tried to use your
suggestion like this:

grunt> cat in.dat
1
2
3
4
5
6
7
8
9
grunt> C = LOAD 'in.dat' AS (A1); -- previously generated data
grunt> cat in2.dat
12
2
13
1
10
9
11
8
grunt> A = LOAD 'in2.dat' AS (A1); -- new data
grunt> B1 = join A by A1, C by A1;
grunt> B2 = filter B1 by SIZE(C) == 0;

Which gives me this error:

2012-07-04 14:36:16,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: Pig script failed to parse:
<line 14, column 23> Invalid scalar projection: C : A column needs to be
projected from a relation for it to be used as a scalar
Details at logfile: /home/schwenk/pig-0.10.0/pig_1341403702015.log

The relevant pig stack trace from the logfile can be found at

http://pastebin.com/MxPfduWS

What am I doing wrong?

Greetings,
Johannes

Am 25.06.2012 18:39, schrieb Alan Gates:
> This type of in is really a semi-join.  So you could rewrite this as:
>
> B1 = join A by A1, C by A1;
> B2 = filter B1 by SIZE(C) > 0;
> B = foreach B2 flatten(A);
>
> Alan.
>
> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>
>> Dear all,
>>
>> in the sql, there is a in clause  which is used to check if the value
>> is in a set or not? Does pig also have the same in clause? Such as:
>>
>> B = filter A by A1 in C;
>>
>> A,B,C are relation names and A1 is a column_name of A.
>>
>> Thanks!
>>
>> Yong
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434