Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Does pig support in clause?


Copy link to this message
-
Re: Does pig support in clause?
Agreed.  And with some optimization we could make semi-join more efficient than this since it only needs to keep one record per key per map instead of all the records for a key.

Alan.

On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote:

> This could be a cool rewrite feature like CUBE/SAMPLE.
>
> Russell Jurney http://datasyndrome.com
>
> On Jun 25, 2012, at 9:39 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>> This type of in is really a semi-join.  So you could rewrite this as:
>>
>> B1 = join A by A1, C by A1;
>> B2 = filter B1 by SIZE(C) > 0;
>> B = foreach B2 flatten(A);
>>
>> Alan.
>>
>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>>
>>> Dear all,
>>>
>>> in the sql, there is a in clause  which is used to check if the value
>>> is in a set or not? Does pig also have the same in clause? Such as:
>>>
>>> B = filter A by A1 in C;
>>>
>>> A,B,C are relation names and A1 is a column_name of A.
>>>
>>> Thanks!
>>>
>>> Yong
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB