Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - DISTINCT with 2 fields in a tuple

Copy link to this message
Re: DISTINCT with 2 fields in a tuple
Prashant Kommireddi 2012-04-11, 20:57
You are doing a distinct on a Tuple, and not a Bag?

In your example, DISTINCT on Field name on each record/tuple would not make
sense as its always a single value. You need to group by on a certain key
before a distinct.
On Wed, Apr 11, 2012 at 1:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I am trying to get distinct from 2 fields in a record. something like
> select distinct a, b from c; So I wrote this in pig which is actually not
> working. I did:
> A = LOAD '/examples/form_out/part-m-00000' USING PigStorage('\t') AS
> (FILE_NAME:chararray,FORM_ID:chararray,SET_ID:chararray);
> B = foreach A {dist = DISTINCT A.FORM_ID, A.SET_ID; GENERATE dist;}
> ERROR 1000: Error during parsing. Invalid alias: A in {FILE_NAME: chararray
> ...
> But this doesn't seem to be working. I thought A is a tuple and form_id and
> set_id are fields that I can do DISTINCT on. I saw similar example online
> but not exactly same.