Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Does pig support in clause?


+
yonghu 2012-06-25, 09:50
+
Alan Gates 2012-06-25, 16:39
+
Russell Jurney 2012-06-25, 17:17
+
Alan Gates 2012-06-25, 17:50
Copy link to this message
-
Re: Does pig support in clause?
Gianmarco De Francisci Mo... 2012-06-26, 05:56
Bloom filters would help efficiency here.
A bloom join or semi-join would be a nice addition to Pig.

Cheers,
--
Gianmarco
On Mon, Jun 25, 2012 at 7:50 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> Agreed.  And with some optimization we could make semi-join more efficient
> than this since it only needs to keep one record per key per map instead of
> all the records for a key.
>
> Alan.
>
> On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote:
>
> > This could be a cool rewrite feature like CUBE/SAMPLE.
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On Jun 25, 2012, at 9:39 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
> >
> >> This type of in is really a semi-join.  So you could rewrite this as:
> >>
> >> B1 = join A by A1, C by A1;
> >> B2 = filter B1 by SIZE(C) > 0;
> >> B = foreach B2 flatten(A);
> >>
> >> Alan.
> >>
> >> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
> >>
> >>> Dear all,
> >>>
> >>> in the sql, there is a in clause  which is used to check if the value
> >>> is in a set or not? Does pig also have the same in clause? Such as:
> >>>
> >>> B = filter A by A1 in C;
> >>>
> >>> A,B,C are relation names and A1 is a column_name of A.
> >>>
> >>> Thanks!
> >>>
> >>> Yong
> >>
>
>
+
Alan Gates 2012-06-26, 16:56
+
Johannes Schwenk 2012-07-04, 12:42
+
Ruslan Al-Fakikh 2012-07-04, 13:53
+
Johannes Schwenk 2012-07-04, 14:01
+
Hien Luu 2012-06-25, 14:56