Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - how can I distinct one field of a relation


Copy link to this message
-
Re: how can I distinct one field of a relation
Ruslan Al-Fakikh 2012-06-27, 14:09
Hey Haitao,

I didn't get exactly what your requirement was and your example seems
to be incomplete. Here it is:

A is:
1,2,3
1,2,3
4,5,6

What I want is :
1,2,3
4,5,6

What you did here is DISTINCT'ed by all fields, but what if the input is
1,2,3
1,3,4
4,5,6
and you are trying to DISTINCT by the first field. What output do you
want for such a case?
Ruslan

On Wed, Jun 27, 2012 at 3:25 PM, Subir S <[EMAIL PROTECTED]> wrote:
> If those values/fields that differ are not a problem to exclude, then may
> be you can use a FILTER to exclude....
> Also as @Jonathan said, you may project fields you want and then distinct.
> He just gave a example of generating a1, you may take more fields in
> foreach..generate clause
>
> On Wed, Jun 27, 2012 at 3:17 PM, Haitao Yao <[EMAIL PROTECTED]> wrote:
>
>> will, not exactly.
>> I want a subset of A with all fields, and field a1 is distinct.
>> for example:
>> A is:
>> 1,2,3
>> 1,2,3
>> 4,5,6
>>
>> What I want is :
>> 1,2,3
>> 4,5,6
>>
>> How can I do this with the keyword distinct?
>>
>>
>>
>> Haitao Yao
>> [EMAIL PROTECTED]
>> weibo: @haitao_yao
>> Skype:  haitao.yao.final
>>
>> 在 2012-6-27,下午2:06, Jonathan Coveney 写道:
>>
>> > If you JUST want a1, then you would do
>> > A = LOAD 'data' AS (a1:int,a2:int,a3:int);
>> > B = DISTINCT (foreach A generate a1);
>> >
>> > basically you project the column you want, and distinct on it.
>> >
>> > 2012/6/26 Haitao Yao <[EMAIL PROTECTED]>
>> >
>> >> I want a subset of A with a1 value distinct.
>> >> the current distinct will compare all the fields in A, which is not
>> what I
>> >> want.
>> >>
>> >>
>> >>
>> >> Haitao Yao
>> >> [EMAIL PROTECTED]
>> >> weibo: @haitao_yao
>> >> Skype:  haitao.yao.final
>> >>
>> >> 在 2012-6-27,上午11:18, Jonathan Coveney 写道:
>> >>
>> >>> What is your desired output? Sounds like you want a group.
>> >>>
>> >>> 2012/6/26 Haitao Yao <[EMAIL PROTECTED]>
>> >>>
>> >>>> hi,
>> >>>>      How can I distinct only one field of a relation?
>> >>>>      here's the demo:
>> >>>>
>> >>>>      A = LOAD 'data' AS (a1:int,a2:int,a3:int);
>> >>>>      B = distinct A by a1;
>> >>>>
>> >>>>
>> >>>>      how can I do this?
>> >>>>
>> >>>>
>> >>>>
>> >>>> Haitao Yao
>> >>>> [EMAIL PROTECTED]
>> >>>> weibo: @haitao_yao
>> >>>> Skype:  haitao.yao.final
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>