Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DISTINCT with 2 fields in a tuple


Copy link to this message
-
Re: DISTINCT with 2 fields in a tuple
Exactly like you posted.

Cheers,
--
Gianmarco

On Thu, Apr 12, 2012 at 16:55, Mohit Anchlia <[EMAIL PROTECTED]> wrote:

> How can I do distinct with foreach? Are those 2 separate statement like the
> one I posted or something different?
>
> On Thu, Apr 12, 2012 at 7:49 AM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > Distinct with the foreach is more efficient then grouping, as long as you
> > don't need the rest of the data you are better off with this solution.
> >
> > With the syntax A.FORM_ID, A.SET_ID you are invoking scalar projection,
> > that is you are telling Pig to treat the value as a scalar. The right
> > syntax is the first one (without the "A." in front).
> >
> > Cheers,
> > --
> > Gianmarco
> >
> >
> >
> > On Wed, Apr 11, 2012 at 23:06, Mohit Anchlia <[EMAIL PROTECTED]>
> > wrote:
> >
> > >  Thanks I tried something like this and it worked, but I have one more
> > > question:
> > >
> > >
> > > grunt> B = foreach A GENERATE FORM_ID, SET_ID;
> > >
> > > grunt> C= DISTINCT B;
> > >
> > > What's the different between foreach A GENERATE FORM_ID, SET_ID;  and
> > > foreach A GENERATE A.FORM_ID, A.SET_ID;, To me they look the same but
> > > results are different.
> > >
> > > On Wed, Apr 11, 2012 at 1:57 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > You are doing a distinct on a Tuple, and not a Bag?
> > > >
> > > > In your example, DISTINCT on Field name on each record/tuple would
> not
> > > make
> > > > sense as its always a single value. You need to group by on a certain
> > key
> > > > before a distinct.
> > > >
> > > >
> > > > On Wed, Apr 11, 2012 at 1:53 PM, Mohit Anchlia <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > I am trying to get distinct from 2 fields in a record. something
> like
> > > > > select distinct a, b from c; So I wrote this in pig which is
> actually
> > > not
> > > > > working. I did:
> > > > >
> > > > >
> > > > > A = LOAD '/examples/form_out/part-m-00000' USING PigStorage('\t')
> AS
> > > > > (FILE_NAME:chararray,FORM_ID:chararray,SET_ID:chararray);
> > > > >
> > > > > B = foreach A {dist = DISTINCT A.FORM_ID, A.SET_ID; GENERATE dist;}
> > > > >
> > > > > ERROR 1000: Error during parsing. Invalid alias: A in {FILE_NAME:
> > > > chararray
> > > > > ...
> > > > >
> > > > > But this doesn't seem to be working. I thought A is a tuple and
> > form_id
> > > > and
> > > > > set_id are fields that I can do DISTINCT on. I saw similar example
> > > online
> > > > > but not exactly same.
> > > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB