Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - validation on schema vs real object type when writing out udf schema


Copy link to this message
-
validation on schema vs real object type when writing out udf schema
Yang 2012-09-05, 18:43
I just debugged through a very nasty bug where I did

GENERATE .....
 1 AS myfield
....
and later

GENERATE
myUdf(.... myfield....).
where myUdf basically copies myfield to the output , and assigns the output
schema to be chararray.
this worked fine when writing out the data using BinStorage(). but when I
loaded the data, there was an error saying that the
internal data type is int, while the schema is declared to be chararray.
 the error message was not very clear, so it took me quite a while
to figure out.
it would be nice to actually do validation between the object type and
schema when we generate the data, so the error is caught earlier

thanks
Yang