Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Does pig support in clause?


Copy link to this message
-
Re: Does pig support in clause?
Hi Johannes,

Try this
C = LOAD 'in.dat' AS (A1);
A = LOAD 'in2.dat' AS (A1);

joined = JOIN A BY A1 LEFT OUTER, C BY A1;

DESCRIBE joined;

newEntries = FILTER joined BY C::A1 IS NULL;

DUMP newEntries;

Ruslan

On Wed, Jul 4, 2012 at 4:42 PM, Johannes Schwenk
<[EMAIL PROTECTED]> wrote:
> Hi Alan,
>
> I'd like to use this method to not include records in my output that are
> already present in previously computed data. So I tried to use your
> suggestion like this:
>
> grunt> cat in.dat
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> grunt> C = LOAD 'in.dat' AS (A1); -- previously generated data
> grunt> cat in2.dat
> 12
> 2
> 13
> 1
> 10
> 9
> 11
> 8
> grunt> A = LOAD 'in2.dat' AS (A1); -- new data
> grunt> B1 = join A by A1, C by A1;
> grunt> B2 = filter B1 by SIZE(C) == 0;
>
> Which gives me this error:
>
> 2012-07-04 14:36:16,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1200: Pig script failed to parse:
> <line 14, column 23> Invalid scalar projection: C : A column needs to be
> projected from a relation for it to be used as a scalar
> Details at logfile: /home/schwenk/pig-0.10.0/pig_1341403702015.log
>
> The relevant pig stack trace from the logfile can be found at
>
> http://pastebin.com/MxPfduWS
>
> What am I doing wrong?
>
> Greetings,
> Johannes
>
> Am 25.06.2012 18:39, schrieb Alan Gates:
>> This type of in is really a semi-join.  So you could rewrite this as:
>>
>> B1 = join A by A1, C by A1;
>> B2 = filter B1 by SIZE(C) > 0;
>> B = foreach B2 flatten(A);
>>
>> Alan.
>>
>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>>
>>> Dear all,
>>>
>>> in the sql, there is a in clause  which is used to check if the value
>>> is in a set or not? Does pig also have the same in clause? Such as:
>>>
>>> B = filter A by A1 in C;
>>>
>>> A,B,C are relation names and A1 is a column_name of A.
>>>
>>> Thanks!
>>>
>>> Yong
>>
>
>
>
> Johannes Schwenk
>
> --
> Softwareentwickler (Reporting)
> ________________________________________________________
>
> ADITION technologies AG
> Schwarzwaldstraße 78b
> 79117 Freiburg
>
> http://www.adition.com
>
> T +49 / (0)761 / 88147 - 30
> F +49 / (0)761 / 88147 - 77
> SUPPORT +49  / (0)1805 - ADITION
>
> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>
> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> UStIDNr.: DE 218 858 434
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB