


how can I distinct one field of a relation
Haitao Yao 20120627, 02:54
hi, How can I distinct only one field of a relation? here's the demo:
A = LOAD 'data' AS (a1:int,a2:int,a3:int); B = distinct A by a1; how can I do this?
Re: how can I distinct one field of a relation
Jonathan Coveney 20120627, 03:18
What is your desired output? Sounds like you want a group.
2012/6/26 Haitao Yao <[EMAIL PROTECTED]>
Re: how can I distinct one field of a relation
Haitao Yao 20120627, 06:00
I want a subset of A with a1 value distinct. the current distinct will compare all the fields in A, which is not what I want.
在 2012627，上午11:18， Jonathan Coveney 写道：
Re: how can I distinct one field of a relation
Jonathan Coveney 20120627, 06:06
If you JUST want a1, then you would do A = LOAD 'data' AS (a1:int,a2:int,a3:int); B = DISTINCT (foreach A generate a1);
basically you project the column you want, and distinct on it.
2012/6/26 Haitao Yao <[EMAIL PROTECTED]>
Re: how can I distinct one field of a relation
Haitao Yao 20120627, 09:47
will, not exactly. I want a subset of A with all fields, and field a1 is distinct. for example: A is: 1,2,3 1,2,3 4,5,6
What I want is : 1,2,3 4,5,6
How can I do this with the keyword distinct?
在 2012627，下午2:06， Jonathan Coveney 写道：
Re: how can I distinct one field of a relation
Subir S 20120627, 11:25
If those values/fields that differ are not a problem to exclude, then may be you can use a FILTER to exclude.... Also as @Jonathan said, you may project fields you want and then distinct. He just gave a example of generating a1, you may take more fields in foreach..generate clause
On Wed, Jun 27, 2012 at 3:17 PM, Haitao Yao <[EMAIL PROTECTED]> wrote:
Re: how can I distinct one field of a relation
Ruslan AlFakikh 20120627, 14:09
Hey Haitao,
I didn't get exactly what your requirement was and your example seems to be incomplete. Here it is:
A is: 1,2,3 1,2,3 4,5,6
What I want is : 1,2,3 4,5,6
What you did here is DISTINCT'ed by all fields, but what if the input is 1,2,3 1,3,4 4,5,6 and you are trying to DISTINCT by the first field. What output do you want for such a case? Ruslan
