Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DISTINCT with 2 fields in a tuple


Copy link to this message
-
Re: DISTINCT with 2 fields in a tuple
Thanks I tried something like this and it worked, but I have one more
question:
grunt> B = foreach A GENERATE FORM_ID, SET_ID;

grunt> C= DISTINCT B;

What's the different between foreach A GENERATE FORM_ID, SET_ID;  and
foreach A GENERATE A.FORM_ID, A.SET_ID;, To me they look the same but
results are different.

On Wed, Apr 11, 2012 at 1:57 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> You are doing a distinct on a Tuple, and not a Bag?
>
> In your example, DISTINCT on Field name on each record/tuple would not make
> sense as its always a single value. You need to group by on a certain key
> before a distinct.
>
>
> On Wed, Apr 11, 2012 at 1:53 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
>
> > I am trying to get distinct from 2 fields in a record. something like
> > select distinct a, b from c; So I wrote this in pig which is actually not
> > working. I did:
> >
> >
> > A = LOAD '/examples/form_out/part-m-00000' USING PigStorage('\t') AS
> > (FILE_NAME:chararray,FORM_ID:chararray,SET_ID:chararray);
> >
> > B = foreach A {dist = DISTINCT A.FORM_ID, A.SET_ID; GENERATE dist;}
> >
> > ERROR 1000: Error during parsing. Invalid alias: A in {FILE_NAME:
> chararray
> > ...
> >
> > But this doesn't seem to be working. I thought A is a tuple and form_id
> and
> > set_id are fields that I can do DISTINCT on. I saw similar example online
> > but not exactly same.
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB