-Re: Schema validation of a field's default values
Doug Cutting 2012-11-05, 18:46
I'd welcome improvements to default value validation in Avro. For
performance, I think this should be an explicit, separate operation from
parsing schemas. But we might invoke it on schemas at various points,
e.g., when creating a file. If you are able, please contribute your
implementation by filing an issue in Avro's Jira.
On Sat, Nov 3, 2012 at 9:48 AM, Mark Hayes <[EMAIL PROTECTED]> wrote:
> On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>> No, I don't know of a default value validator that's been implemented
>> yet. It would be great to have one.
>> I think this would recursively walk a schema. Whenever a non-null
>> default value is found it could call ResolvingGrammarDecoder#encode().
>> That's what interprets Json default values. (Perhaps this logic
>> should be moved, though.)
> Thanks for the reply Doug.
> I did find ResolvingGrammarDecoder.encode (I saw that it is called by the
> builders) and was using it as you described, but I ran into limitations:
> + When the field type is an array, map or record, values of the
> wrong JSON type (not array or object) are translated to an empty array,
> map or record. For example, specifying a default of 0, null or "" results
> in an empty array, map or record.
> + For all numeric Avro types (int, long, float and double) the default
> value may be of any JSON numeric type, and the JSON values will be coerced
> to the Avro type in spite of the fact that part of the value may be
> lost/truncated. For example, a long default value that exceeds 32-bits
> will be truncated if the field is type int.
> + The byte array length is not validated for a fixed type.
> + For nested fields and certain types (e.g., enums) a cryptic error
> is often output that does not contain the name of the offending field.
> These deficiencies can mask errors made by the user when defining
> a default value. This is important to our application.
> To compensate for these deficiencies we implemented our own checking that
> is more strict than Avro's. To do this, we serialize the default value
> using our own JSON serializer in a special mode where default values are
> applied. Any errors during serialization indicate that the default value
> is invalid.
> Something similar might be done in Avro itself, for example, if the JSON
> encoder were made to operate in a special mode where default values are