Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DISTINCT with 2 fields in a tuple


Copy link to this message
-
Re: DISTINCT with 2 fields in a tuple
Thanks I tried something like this and it worked, but I have one more
question:
grunt> B = foreach A GENERATE FORM_ID, SET_ID;

grunt> C= DISTINCT B;

What's the different between foreach A GENERATE FORM_ID, SET_ID;  and
foreach A GENERATE A.FORM_ID, A.SET_ID;, To me they look the same but
results are different.

On Wed, Apr 11, 2012 at 1:57 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> You are doing a distinct on a Tuple, and not a Bag?
>
> In your example, DISTINCT on Field name on each record/tuple would not make
> sense as its always a single value. You need to group by on a certain key
> before a distinct.
>
>
> On Wed, Apr 11, 2012 at 1:53 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
>
> > I am trying to get distinct from 2 fields in a record. something like
> > select distinct a, b from c; So I wrote this in pig which is actually not
> > working. I did:
> >
> >
> > A = LOAD '/examples/form_out/part-m-00000' USING PigStorage('\t') AS
> > (FILE_NAME:chararray,FORM_ID:chararray,SET_ID:chararray);
> >
> > B = foreach A {dist = DISTINCT A.FORM_ID, A.SET_ID; GENERATE dist;}
> >
> > ERROR 1000: Error during parsing. Invalid alias: A in {FILE_NAME:
> chararray
> > ...
> >
> > But this doesn't seem to be working. I thought A is a tuple and form_id
> and
> > set_id are fields that I can do DISTINCT on. I saw similar example online
> > but not exactly same.
> >
>