Kaluskar, Sanjay 2011-01-19, 11:18
-Re: Seeing DataByteArray values for chararray field in 0.8.0
Thejas M Nair 2011-01-21, 19:39
On 1/19/11 3:18 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]> wrote:
> I have script as follows:
> register lookup.jar;
> a = load 'lookupfile.dat' as(emp_id: chararray);
> b = foreach a generate flatten(com.mycompany.pig.lookup());
The udf in above statement does not have an argument, I assume you meant -
"b = foreach a generate flatten(com.mycompany.pig.lookup(emp_id));"
> My UDF works as expected in versions 0.5.0, 0.6.0 and 0.7.0. In version
> 0.8.0, I notice that the input tuple "input" has 1 field with value of
> type DataByteArray, whereas in earlier versions the value is of type
> String (as expected). Why is this different? I am assuming this is an
> intentional change in 0.8.0. Is there some way to force conversion from
> the raw data before the UDF is invoked, i.e., the old behaviour? What is
> the recommended approach in 0.8.0 for EvalFunc UDFs?
The tuple should contain field of type CHARARRAY in 0.8 as well. I looked at
the explain plan of a similar query and it seemed to be correct.
Can you please open a jira and attach a simplified form of your udf that
reproduces this problem ?