Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Does pig support in clause?


+
yonghu 2012-06-25, 09:50
+
Alan Gates 2012-06-25, 16:39
+
Russell Jurney 2012-06-25, 17:17
+
Alan Gates 2012-06-25, 17:50
+
Gianmarco De Francisci Mo... 2012-06-26, 05:56
+
Alan Gates 2012-06-26, 16:56
Copy link to this message
-
Re: Does pig support in clause?
Hi Alan,

I'd like to use this method to not include records in my output that are
already present in previously computed data. So I tried to use your
suggestion like this:

grunt> cat in.dat
1
2
3
4
5
6
7
8
9
grunt> C = LOAD 'in.dat' AS (A1); -- previously generated data
grunt> cat in2.dat
12
2
13
1
10
9
11
8
grunt> A = LOAD 'in2.dat' AS (A1); -- new data
grunt> B1 = join A by A1, C by A1;
grunt> B2 = filter B1 by SIZE(C) == 0;

Which gives me this error:

2012-07-04 14:36:16,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: Pig script failed to parse:
<line 14, column 23> Invalid scalar projection: C : A column needs to be
projected from a relation for it to be used as a scalar
Details at logfile: /home/schwenk/pig-0.10.0/pig_1341403702015.log

The relevant pig stack trace from the logfile can be found at

http://pastebin.com/MxPfduWS

What am I doing wrong?

Greetings,
Johannes

Am 25.06.2012 18:39, schrieb Alan Gates:
> This type of in is really a semi-join.  So you could rewrite this as:
>
> B1 = join A by A1, C by A1;
> B2 = filter B1 by SIZE(C) > 0;
> B = foreach B2 flatten(A);
>
> Alan.
>
> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>
>> Dear all,
>>
>> in the sql, there is a in clause  which is used to check if the value
>> is in a set or not? Does pig also have the same in clause? Such as:
>>
>> B = filter A by A1 in C;
>>
>> A,B,C are relation names and A1 is a column_name of A.
>>
>> Thanks!
>>
>> Yong
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434

+
Ruslan Al-Fakikh 2012-07-04, 13:53
+
Johannes Schwenk 2012-07-04, 14:01
+
Hien Luu 2012-06-25, 14:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB