|
|
-
filter duplicates from a bag
Marco Cadetg 2012-08-24, 09:35
Hi there,
What is the best way to retrieve duplicates from a bag. I basically would like to do something like the opposite of DISTINCT.
A: {userid: long,foo: long,bar: long}
dump A (1,2,3) (1,2,3) (1,3,2) (2,3,1)
Now I would like to have a bag which contains (1,2,3) (1,2,3)
Thanks, -Marco
-
Re: filter duplicates from a bag
Gianmarco De Francisci Mo... 2012-08-24, 10:19
I would say something along these lines:
B = group A by *; C = foreach B generate group, COUNT(A) as count; D = filter C by count > 1; E = foreach D generate group;
Disclaimer: untested code.
Cheers, -- Gianmarco
On Fri, Aug 24, 2012 at 11:35 AM, Marco Cadetg <[EMAIL PROTECTED]> wrote:
> Hi there, > > What is the best way to retrieve duplicates from a bag. I basically would > like to do something like the opposite of DISTINCT. > > A: {userid: long,foo: long,bar: long} > > dump A > (1,2,3) > (1,2,3) > (1,3,2) > (2,3,1) > > Now I would like to have a bag which contains > (1,2,3) > (1,2,3) > > Thanks, > -Marco >
-
Re: filter duplicates from a bag
Marco Cadetg 2012-08-24, 10:25
Thanks Gianmarco, that is what I was looking for! -Marco
On Fri, Aug 24, 2012 at 12:19 PM, Gianmarco De Francisci Morales < [EMAIL PROTECTED]> wrote:
> I would say something along these lines: > > B = group A by *; > C = foreach B generate group, COUNT(A) as count; > D = filter C by count > 1; > E = foreach D generate group; > > Disclaimer: untested code. > > Cheers, > -- > Gianmarco > > > > On Fri, Aug 24, 2012 at 11:35 AM, Marco Cadetg <[EMAIL PROTECTED]> wrote: > > > Hi there, > > > > What is the best way to retrieve duplicates from a bag. I basically would > > like to do something like the opposite of DISTINCT. > > > > A: {userid: long,foo: long,bar: long} > > > > dump A > > (1,2,3) > > (1,2,3) > > (1,3,2) > > (2,3,1) > > > > Now I would like to have a bag which contains > > (1,2,3) > > (1,2,3) > > > > Thanks, > > -Marco > > >
|
|