|
|
-
Re: Seeing DataByteArray values for chararray field in 0.8.0Thejas M Nair 2011-01-21, 19:39
On 1/19/11 3:18 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]> wrote: > I have script as follows: > > > > register lookup.jar; > > a = load 'lookupfile.dat' as(emp_id: chararray); > > b = foreach a generate flatten(com.mycompany.pig.lookup()); The udf in above statement does not have an argument, I assume you meant - "b = foreach a generate flatten(com.mycompany.pig.lookup(emp_id));" > My UDF works as expected in versions 0.5.0, 0.6.0 and 0.7.0. In version > 0.8.0, I notice that the input tuple "input" has 1 field with value of > type DataByteArray, whereas in earlier versions the value is of type > String (as expected). Why is this different? I am assuming this is an > intentional change in 0.8.0. Is there some way to force conversion from > the raw data before the UDF is invoked, i.e., the old behaviour? What is > the recommended approach in 0.8.0 for EvalFunc UDFs? The tuple should contain field of type CHARARRAY in 0.8 as well. I looked at the explain plan of a similar query and it seemed to be correct. Can you please open a jira and attach a simplified form of your udf that reproduces this problem ? Thanks, Thejas |