|
|
-
Eval UDF passing parameters
Dexin Wang 2010-12-07, 19:44
Hi,
This might be a dumb question. Is it possible to pass anything other than the input tuple to a UDF Eval function?
Basically in my UDF, I need to do some user info lookup. So the input will be:
(userid,f1,f2)
with this UDF, I want to convert it to something like
(userid,age,gender,location,f1,f2)
where in the UDF I do a DB lookup on the userid and returns user's info (age, gender, etc). But I don't necessarily want to pass back the same user info fields, e.g. sometimes I only want age.
I hope there is a way for me to tell the UDF that I only want "age", and sometimes "age, location", etc.
What's the best way to achieve this without having to write a separate UDF for every case?
Thanks. Dexin
-
Re: Eval UDF passing parameters
Zach Bailey 2010-12-07, 19:47
You can pass parameters via the UDF constructor. For example: public MyUDF(boolean includeAge, boolean includeGender) then you would initialize it like so in your pig script: define MY_UDF_ONLY_AGE com.package.MyUDF(true, false) and use it like: data_with_age = FOREACH data GENERATE user_id, MY_UDF_ONLY_AGE(user_id); HTH, Zach On Tuesday, December 7, 2010 at 2:44 PM, Dexin Wang wrote:
> Hi, > > This might be a dumb question. Is it possible to pass anything other than > the input tuple to a UDF Eval function? > > Basically in my UDF, I need to do some user info lookup. So the input will > be: > > (userid,f1,f2) > > with this UDF, I want to convert it to something like > > (userid,age,gender,location,f1,f2) > > where in the UDF I do a DB lookup on the userid and returns user's info > (age, gender, etc). But I don't necessarily want to pass back the same user > info fields, e.g. sometimes I only want age. > > I hope there is a way for me to tell the UDF that I only want "age", and > sometimes "age, location", etc. > > What's the best way to achieve this without having to write a separate UDF > for every case? > > Thanks. > Dexin > > > >
-
Re: Eval UDF passing parameters
Dexin Wang 2010-12-07, 20:09
ah nice. Thank you so much Zach!
On Tue, Dec 7, 2010 at 11:47 AM, Zach Bailey <[EMAIL PROTECTED]>wrote:
> > You can pass parameters via the UDF constructor. For example: > > > public MyUDF(boolean includeAge, boolean includeGender) > > > then you would initialize it like so in your pig script: > > > define MY_UDF_ONLY_AGE com.package.MyUDF(true, false) > > > and use it like: > > > data_with_age = FOREACH data GENERATE user_id, MY_UDF_ONLY_AGE(user_id); > > > HTH, > Zach > > > On Tuesday, December 7, 2010 at 2:44 PM, Dexin Wang wrote: > > > Hi, > > > > This might be a dumb question. Is it possible to pass anything other than > > the input tuple to a UDF Eval function? > > > > Basically in my UDF, I need to do some user info lookup. So the input > will > > be: > > > > (userid,f1,f2) > > > > with this UDF, I want to convert it to something like > > > > (userid,age,gender,location,f1,f2) > > > > where in the UDF I do a DB lookup on the userid and returns user's info > > (age, gender, etc). But I don't necessarily want to pass back the same > user > > info fields, e.g. sometimes I only want age. > > > > I hope there is a way for me to tell the UDF that I only want "age", and > > sometimes "age, location", etc. > > > > What's the best way to achieve this without having to write a separate > UDF > > for every case? > > > > Thanks. > > Dexin > > > > > > > > > > >
|
|