Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Does pig support in clause?


+
yonghu 2012-06-25, 09:50
+
Alan Gates 2012-06-25, 16:39
+
Russell Jurney 2012-06-25, 17:17
+
Alan Gates 2012-06-25, 17:50
+
Gianmarco De Francisci Mo... 2012-06-26, 05:56
Copy link to this message
-
Re: Does pig support in clause?
As of 0.10 there are UDFs for building bloom filters.  Those could be used to construct a bloom join.

Alan.

On Jun 25, 2012, at 10:56 PM, Gianmarco De Francisci Morales wrote:

> Bloom filters would help efficiency here.
> A bloom join or semi-join would be a nice addition to Pig.
>
> Cheers,
> --
> Gianmarco
>
>
>
>
> On Mon, Jun 25, 2012 at 7:50 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>> Agreed.  And with some optimization we could make semi-join more efficient
>> than this since it only needs to keep one record per key per map instead of
>> all the records for a key.
>>
>> Alan.
>>
>> On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote:
>>
>>> This could be a cool rewrite feature like CUBE/SAMPLE.
>>>
>>> Russell Jurney http://datasyndrome.com
>>>
>>> On Jun 25, 2012, at 9:39 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
>>>
>>>> This type of in is really a semi-join.  So you could rewrite this as:
>>>>
>>>> B1 = join A by A1, C by A1;
>>>> B2 = filter B1 by SIZE(C) > 0;
>>>> B = foreach B2 flatten(A);
>>>>
>>>> Alan.
>>>>
>>>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> in the sql, there is a in clause  which is used to check if the value
>>>>> is in a set or not? Does pig also have the same in clause? Such as:
>>>>>
>>>>> B = filter A by A1 in C;
>>>>>
>>>>> A,B,C are relation names and A1 is a column_name of A.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Yong
>>>>
>>
>>
+
Johannes Schwenk 2012-07-04, 12:42
+
Ruslan Al-Fakikh 2012-07-04, 13:53
+
Johannes Schwenk 2012-07-04, 14:01
+
Hien Luu 2012-06-25, 14:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB