Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema validation of a field's default values


Copy link to this message
-
Re: Schema validation of a field's default values
Mark,

I'd welcome improvements to default value validation in Avro.  For
performance, I think this should be an explicit, separate operation from
parsing schemas.  But we might invoke it on schemas at various points,
e.g., when creating a file.  If you are able, please contribute your
implementation by filing an issue in Avro's Jira.

Thanks,

Doug
On Sat, Nov 3, 2012 at 9:48 AM, Mark Hayes <[EMAIL PROTECTED]> wrote:

> On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
>> No, I don't know of a default value validator that's been implemented
>> yet.  It would be great to have one.
>>
>> I think this would recursively walk a schema.  Whenever a non-null
>> default value is found it could call ResolvingGrammarDecoder#encode().
>>  That's what interprets Json default values.  (Perhaps this logic
>> should be moved, though.)
>
>
> Thanks for the reply Doug.
>
> I did find ResolvingGrammarDecoder.encode (I saw that it is called by the
> builders) and was using it as you described, but I ran into limitations:
>
> +  When the field type is an array, map or record, values of the
> wrong JSON type (not array or object) are translated to an empty array,
> map or record.  For example, specifying a default of 0, null or "" results
> in an empty array, map or record.
>
> + For all numeric Avro types (int, long, float and double) the default
> value may be of any JSON numeric type, and the JSON values will be coerced
> to the Avro type in spite of the fact that part of the value may be
> lost/truncated.  For example, a long default value that exceeds 32-bits
> will be truncated if the field is type int.
>
> + The byte array length is not validated for a fixed type.
>
> + For nested fields and certain types (e.g., enums) a cryptic error
> is often output that does not contain the name of the offending field.
>
> These deficiencies can mask errors made by the user when defining
> a default value.  This is important to our application.
>
> To compensate for these deficiencies we implemented our own checking that
> is more strict than Avro's.  To do this, we serialize the default value
> using our own JSON serializer in a special mode where default values are
> applied.  Any errors during serialization indicate that the default value
> is invalid.
>
> Something similar might be done in Avro itself, for example, if the JSON
> encoder were made to operate in a special mode where default values are
> applied.
>
> --mark
>