Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> DISTINCT with 2 fields in a tuple


+
Mohit Anchlia 2012-04-11, 20:53
Copy link to this message
-
Re: DISTINCT with 2 fields in a tuple
Just group on those 2 fields. The 'group' field of the output will contain all the
distinct combinations. That is, of course, if that is what you wanted to do in the first place.
So no 'DISTINCT' is really necessary.

On Apr 11, 2012, at 1:53 PM, Mohit Anchlia wrote:

> I am trying to get distinct from 2 fields in a record. something like
> select distinct a, b from c; So I wrote this in pig which is actually not
> working. I did:
>
>
> A = LOAD '/examples/form_out/part-m-00000' USING PigStorage('\t') AS
> (FILE_NAME:chararray,FORM_ID:chararray,SET_ID:chararray);
>
> B = foreach A {dist = DISTINCT A.FORM_ID, A.SET_ID; GENERATE dist;}
>
> ERROR 1000: Error during parsing. Invalid alias: A in {FILE_NAME: chararray
> ...
>
> But this doesn't seem to be working. I thought A is a tuple and form_id and
> set_id are fields that I can do DISTINCT on. I saw similar example online
> but not exactly same.
+
Prashant Kommireddi 2012-04-11, 20:57
+
Mohit Anchlia 2012-04-11, 21:06
+
Gianmarco De Francisci Mo... 2012-04-12, 14:49
+
Mohit Anchlia 2012-04-12, 14:55
+
Gianmarco De Francisci Mo... 2012-04-12, 15:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB