Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema validation of a field's default values


Copy link to this message
-
Re: Schema validation of a field's default values
Mark,

I'd welcome improvements to default value validation in Avro.  For
performance, I think this should be an explicit, separate operation from
parsing schemas.  But we might invoke it on schemas at various points,
e.g., when creating a file.  If you are able, please contribute your
implementation by filing an issue in Avro's Jira.

Thanks,

Doug
On Sat, Nov 3, 2012 at 9:48 AM, Mark Hayes <[EMAIL PROTECTED]> wrote:

> On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
>> No, I don't know of a default value validator that's been implemented
>> yet.  It would be great to have one.
>>
>> I think this would recursively walk a schema.  Whenever a non-null
>> default value is found it could call ResolvingGrammarDecoder#encode().
>>  That's what interprets Json default values.  (Perhaps this logic
>> should be moved, though.)
>
>
> Thanks for the reply Doug.
>
> I did find ResolvingGrammarDecoder.encode (I saw that it is called by the
> builders) and was using it as you described, but I ran into limitations:
>
> +  When the field type is an array, map or record, values of the
> wrong JSON type (not array or object) are translated to an empty array,
> map or record.  For example, specifying a default of 0, null or "" results
> in an empty array, map or record.
>
> + For all numeric Avro types (int, long, float and double) the default
> value may be of any JSON numeric type, and the JSON values will be coerced
> to the Avro type in spite of the fact that part of the value may be
> lost/truncated.  For example, a long default value that exceeds 32-bits
> will be truncated if the field is type int.
>
> + The byte array length is not validated for a fixed type.
>
> + For nested fields and certain types (e.g., enums) a cryptic error
> is often output that does not contain the name of the offending field.
>
> These deficiencies can mask errors made by the user when defining
> a default value.  This is important to our application.
>
> To compensate for these deficiencies we implemented our own checking that
> is more strict than Avro's.  To do this, we serialize the default value
> using our own JSON serializer in a special mode where default values are
> applied.  Any errors during serialization indicate that the default value
> is invalid.
>
> Something similar might be done in Avro itself, for example, if the JSON
> encoder were made to operate in a special mode where default values are
> applied.
>
> --mark
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB