Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Does pig support in clause?


Copy link to this message
-
Re: Does pig support in clause?
Alan Gates 2012-06-26, 16:56
As of 0.10 there are UDFs for building bloom filters.  Those could be used to construct a bloom join.

Alan.

On Jun 25, 2012, at 10:56 PM, Gianmarco De Francisci Morales wrote:

> Bloom filters would help efficiency here.
> A bloom join or semi-join would be a nice addition to Pig.
>
> Cheers,
> --
> Gianmarco
>
>
>
>
> On Mon, Jun 25, 2012 at 7:50 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>> Agreed.  And with some optimization we could make semi-join more efficient
>> than this since it only needs to keep one record per key per map instead of
>> all the records for a key.
>>
>> Alan.
>>
>> On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote:
>>
>>> This could be a cool rewrite feature like CUBE/SAMPLE.
>>>
>>> Russell Jurney http://datasyndrome.com
>>>
>>> On Jun 25, 2012, at 9:39 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
>>>
>>>> This type of in is really a semi-join.  So you could rewrite this as:
>>>>
>>>> B1 = join A by A1, C by A1;
>>>> B2 = filter B1 by SIZE(C) > 0;
>>>> B = foreach B2 flatten(A);
>>>>
>>>> Alan.
>>>>
>>>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> in the sql, there is a in clause  which is used to check if the value
>>>>> is in a set or not? Does pig also have the same in clause? Such as:
>>>>>
>>>>> B = filter A by A1 in C;
>>>>>
>>>>> A,B,C are relation names and A1 is a column_name of A.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Yong
>>>>
>>
>>