Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - about distinct


Copy link to this message
-
Re: about distinct
Thejas Nair 2012-03-09, 03:00
On 3/5/12 7:19 PM, guoyun wrote:
> Dear All:
> this is the description of wiki about distinct:
>
> grunt>  A = load 'mydata' using PigStorage() as (a, b, c);
> grunt>B = group A by a;
> grunt>  C = foreach B {
> D = distinct A.b;
> generate flatten(group), COUNT(D);
> }
>
> but if filed b have sub fileds,for example:
> A = load 'mydata' using PigStorage() as (a, b(b1,b2,b3), c);
>
> if i want to distinct D = distinct A.b.b1,how can i do?because pig is
> not allowed to use D = distinct A.b.b1;
>
> Thank you!
>
>
>
You need to use another nested foreach statement. -

  C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG;
generate flatten(group), COUNT(D);}

-Thejas