Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema validation of a field's default values


Copy link to this message
-
Re: Schema validation of a field's default values
On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> No, I don't know of a default value validator that's been implemented
> yet.  It would be great to have one.
>
> I think this would recursively walk a schema.  Whenever a non-null
> default value is found it could call ResolvingGrammarDecoder#encode().
>  That's what interprets Json default values.  (Perhaps this logic
> should be moved, though.)
Thanks for the reply Doug.

I did find ResolvingGrammarDecoder.encode (I saw that it is called by the
builders) and was using it as you described, but I ran into limitations:

+  When the field type is an array, map or record, values of the wrong JSON
type (not array or object) are translated to an empty array, map or record.
 For example, specifying a default of 0, null or "" results in an empty
array, map or record.

+ For all numeric Avro types (int, long, float and double) the default
value may be of any JSON numeric type, and the JSON values will be coerced
to the Avro type in spite of the fact that part of the value may be
lost/truncated.  For example, a long default value that exceeds 32-bits
will be truncated if the field is type int.

+ The byte array length is not validated for a fixed type.

+ For nested fields and certain types (e.g., enums) a cryptic error
is often output that does not contain the name of the offending field.

These deficiencies can mask errors made by the user when defining a default
value.  This is important to our application.

To compensate for these deficiencies we implemented our own checking that
is more strict than Avro's.  To do this, we serialize the default value
using our own JSON serializer in a special mode where default values are
applied.  Any errors during serialization indicate that the default value
is invalid.

Something similar might be done in Avro itself, for example, if the JSON
encoder were made to operate in a special mode where default values are
applied.

--mark