Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - removing dupes from a bag while saving first occurrence


Copy link to this message
-
Re: removing dupes from a bag while saving first occurrence
Norbert Burger 2013-03-08, 22:10
Looking at your sample, it seems you have a GROUPBY generating these
bags...?  Could you just insert a DISTINCT before this GROUP BY?

Norbert

On Fri, Mar 8, 2013 at 5:00 PM, Chan, Tim <[EMAIL PROTECTED]> wrote:

> If I have a bag and would like to remove dupes, while saving the first
> occurrence, is this possible?
>
> For example, for the following bag:
>
> (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)})
>
> I would like my result to be the following:
>
> (group_1,{(2012-12-15,a),(2012-12-23,c)})
>