Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Seeing DataByteArray values for chararray field in 0.8.0

Copy link to this message
Re: Seeing DataByteArray values for chararray field in 0.8.0

On 1/19/11 3:18 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]> wrote:

> I have script as follows:
> register lookup.jar;
> a = load 'lookupfile.dat' as(emp_id: chararray);
> b = foreach a generate flatten(com.mycompany.pig.lookup());

The udf in above statement does not have an argument, I assume you meant -
"b = foreach a generate flatten(com.mycompany.pig.lookup(emp_id));"

> My UDF works as expected in versions 0.5.0, 0.6.0 and 0.7.0. In version
> 0.8.0, I notice that the input tuple "input" has 1 field with value of
> type DataByteArray, whereas in earlier versions the value is of type
> String (as expected). Why is this different? I am assuming this is an
> intentional change in 0.8.0. Is there some way to force conversion from
> the raw data before the UDF is invoked, i.e., the old behaviour? What is
> the recommended approach in 0.8.0 for EvalFunc UDFs?

The tuple should contain field of type CHARARRAY in 0.8 as well. I looked at
the explain plan of a similar query and it seemed to be correct.
Can you please open a jira and attach a simplified form of your udf that
reproduces this problem ?