Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> removing dupes from a bag while saving first occurrence


+
Chan, Tim 2013-03-08, 22:00
+
Norbert Burger 2013-03-08, 22:10
+
Chan, Tim 2013-03-08, 23:12
Copy link to this message
-
Re: removing dupes from a bag while saving first occurrence
Did u try to order them by date before grouping them?
On Mar 9, 2013 12:12 AM, "Chan, Tim" <[EMAIL PROTECTED]> wrote:

> Using a distinct before the group by does not guarantee the date order. I
> need to keep the earliest occurrence of 'a' and discard all later
> occurrences of 'a'.
>
>
> On Fri, Mar 8, 2013 at 2:10 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Looking at your sample, it seems you have a GROUPBY generating these
> > bags...?  Could you just insert a DISTINCT before this GROUP BY?
> >
> > Norbert
> >
> > On Fri, Mar 8, 2013 at 5:00 PM, Chan, Tim <[EMAIL PROTECTED]> wrote:
> >
> > > If I have a bag and would like to remove dupes, while saving the first
> > > occurrence, is this possible?
> > >
> > > For example, for the following bag:
> > >
> > > (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)})
> > >
> > > I would like my result to be the following:
> > >
> > > (group_1,{(2012-12-15,a),(2012-12-23,c)})
> > >
> >
>
+
Panshul Whisper 2013-03-08, 23:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB