|
|
+
yonghu 2012-06-25, 09:50
+
Alan Gates 2012-06-25, 16:39
+
Russell Jurney 2012-06-25, 17:17
+
Alan Gates 2012-06-25, 17:50
-
Re: Does pig support in clause?Gianmarco De Francisci Mo... 2012-06-26, 05:56
Bloom filters would help efficiency here.
A bloom join or semi-join would be a nice addition to Pig. Cheers, -- Gianmarco On Mon, Jun 25, 2012 at 7:50 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > Agreed. And with some optimization we could make semi-join more efficient > than this since it only needs to keep one record per key per map instead of > all the records for a key. > > Alan. > > On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote: > > > This could be a cool rewrite feature like CUBE/SAMPLE. > > > > Russell Jurney http://datasyndrome.com > > > > On Jun 25, 2012, at 9:39 AM, Alan Gates <[EMAIL PROTECTED]> wrote: > > > >> This type of in is really a semi-join. So you could rewrite this as: > >> > >> B1 = join A by A1, C by A1; > >> B2 = filter B1 by SIZE(C) > 0; > >> B = foreach B2 flatten(A); > >> > >> Alan. > >> > >> On Jun 25, 2012, at 2:50 AM, yonghu wrote: > >> > >>> Dear all, > >>> > >>> in the sql, there is a in clause which is used to check if the value > >>> is in a set or not? Does pig also have the same in clause? Such as: > >>> > >>> B = filter A by A1 in C; > >>> > >>> A,B,C are relation names and A1 is a column_name of A. > >>> > >>> Thanks! > >>> > >>> Yong > >> > > +
Alan Gates 2012-06-26, 16:56
+
Johannes Schwenk 2012-07-04, 12:42
+
Ruslan Al-Fakikh 2012-07-04, 13:53
+
Johannes Schwenk 2012-07-04, 14:01
+
Hien Luu 2012-06-25, 14:56
|