Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> about distinct


Copy link to this message
-
Re: about distinct
On 3/5/12 7:19 PM, guoyun wrote:
> Dear All:
> this is the description of wiki about distinct:
>
> grunt>  A = load 'mydata' using PigStorage() as (a, b, c);
> grunt>B = group A by a;
> grunt>  C = foreach B {
> D = distinct A.b;
> generate flatten(group), COUNT(D);
> }
>
> but if filed b have sub fileds,for example:
> A = load 'mydata' using PigStorage() as (a, b(b1,b2,b3), c);
>
> if i want to distinct D = distinct A.b.b1,how can i do?because pig is
> not allowed to use D = distinct A.b.b1;
>
> Thank you!
>
>
>
You need to use another nested foreach statement. -

  C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG;
generate flatten(group), COUNT(D);}

-Thejas
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB