Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> removing dupes from a bag while saving first occurrence


+
Chan, Tim 2013-03-08, 22:00
+
Norbert Burger 2013-03-08, 22:10
+
Chan, Tim 2013-03-08, 23:12
Copy link to this message
-
Re: removing dupes from a bag while saving first occurrence
Did u try to order them by date before grouping them?
On Mar 9, 2013 12:12 AM, "Chan, Tim" <[EMAIL PROTECTED]> wrote:

> Using a distinct before the group by does not guarantee the date order. I
> need to keep the earliest occurrence of 'a' and discard all later
> occurrences of 'a'.
>
>
> On Fri, Mar 8, 2013 at 2:10 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Looking at your sample, it seems you have a GROUPBY generating these
> > bags...?  Could you just insert a DISTINCT before this GROUP BY?
> >
> > Norbert
> >
> > On Fri, Mar 8, 2013 at 5:00 PM, Chan, Tim <[EMAIL PROTECTED]> wrote:
> >
> > > If I have a bag and would like to remove dupes, while saving the first
> > > occurrence, is this possible?
> > >
> > > For example, for the following bag:
> > >
> > > (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)})
> > >
> > > I would like my result to be the following:
> > >
> > > (group_1,{(2012-12-15,a),(2012-12-23,c)})
> > >
> >
>
+
Panshul Whisper 2013-03-08, 23:21