|
|
-
Schema validation of a field's default values
Mark Hayes 2012-10-20, 00:19
Hi,
It looks like the Schema parser doesn't check a field's default value against the field's type. So for example, you can define a string default value for a field of type int. Since the default value is only used in certain circumstances for schema resolution, such an error might not easily be caught during testing, so the danger is that it would crop up at runtime.
Am I correct, so far?
If so, does anyone know of facility in the Avro library, or another library, for checking the schema to ensure that default values are of the proper type? I wasn't able to find one, so I'm considering implementing one myself.
Thanks in advance, --mark
+
Mark Hayes 2012-10-20, 00:19
-
Re: Schema validation of a field's default values
Doug Cutting 2012-10-29, 19:32
On Fri, Oct 19, 2012 at 5:19 PM, Mark Hayes <[EMAIL PROTECTED]> wrote: > It looks like the Schema parser doesn't check a field's default value > against the field's type. So for example, you can define a string default > value for a field of type int. Since the default value is only used in > certain circumstances for schema resolution, such an error might not easily > be caught during testing, so the danger is that it would crop up at runtime. > > Am I correct, so far?
Yes, that's correct.
> If so, does anyone know of facility in the Avro library, or another library, > for checking the schema to ensure that default values are of the proper > type? I wasn't able to find one, so I'm considering implementing one > myself.
No, I don't know of a default value validator that's been implemented yet. It would be great to have one.
I think this would recursively walk a schema. Whenever a non-null default value is found it could call ResolvingGrammarDecoder#encode(). That's what interprets Json default values. (Perhaps this logic should be moved, though.)
Doug
+
Doug Cutting 2012-10-29, 19:32
-
Re: Schema validation of a field's default values
Mark Hayes 2012-11-03, 16:48
On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> No, I don't know of a default value validator that's been implemented > yet. It would be great to have one. > > I think this would recursively walk a schema. Whenever a non-null > default value is found it could call ResolvingGrammarDecoder#encode(). > That's what interprets Json default values. (Perhaps this logic > should be moved, though.) Thanks for the reply Doug.
I did find ResolvingGrammarDecoder.encode (I saw that it is called by the builders) and was using it as you described, but I ran into limitations:
+ When the field type is an array, map or record, values of the wrong JSON type (not array or object) are translated to an empty array, map or record. For example, specifying a default of 0, null or "" results in an empty array, map or record.
+ For all numeric Avro types (int, long, float and double) the default value may be of any JSON numeric type, and the JSON values will be coerced to the Avro type in spite of the fact that part of the value may be lost/truncated. For example, a long default value that exceeds 32-bits will be truncated if the field is type int.
+ The byte array length is not validated for a fixed type.
+ For nested fields and certain types (e.g., enums) a cryptic error is often output that does not contain the name of the offending field.
These deficiencies can mask errors made by the user when defining a default value. This is important to our application.
To compensate for these deficiencies we implemented our own checking that is more strict than Avro's. To do this, we serialize the default value using our own JSON serializer in a special mode where default values are applied. Any errors during serialization indicate that the default value is invalid.
Something similar might be done in Avro itself, for example, if the JSON encoder were made to operate in a special mode where default values are applied.
--mark
+
Mark Hayes 2012-11-03, 16:48
-
Re: Schema validation of a field's default values
Doug Cutting 2012-11-05, 18:46
Mark,
I'd welcome improvements to default value validation in Avro. For performance, I think this should be an explicit, separate operation from parsing schemas. But we might invoke it on schemas at various points, e.g., when creating a file. If you are able, please contribute your implementation by filing an issue in Avro's Jira.
Thanks,
Doug On Sat, Nov 3, 2012 at 9:48 AM, Mark Hayes <[EMAIL PROTECTED]> wrote:
> On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > >> No, I don't know of a default value validator that's been implemented >> yet. It would be great to have one. >> >> I think this would recursively walk a schema. Whenever a non-null >> default value is found it could call ResolvingGrammarDecoder#encode(). >> That's what interprets Json default values. (Perhaps this logic >> should be moved, though.) > > > Thanks for the reply Doug. > > I did find ResolvingGrammarDecoder.encode (I saw that it is called by the > builders) and was using it as you described, but I ran into limitations: > > + When the field type is an array, map or record, values of the > wrong JSON type (not array or object) are translated to an empty array, > map or record. For example, specifying a default of 0, null or "" results > in an empty array, map or record. > > + For all numeric Avro types (int, long, float and double) the default > value may be of any JSON numeric type, and the JSON values will be coerced > to the Avro type in spite of the fact that part of the value may be > lost/truncated. For example, a long default value that exceeds 32-bits > will be truncated if the field is type int. > > + The byte array length is not validated for a fixed type. > > + For nested fields and certain types (e.g., enums) a cryptic error > is often output that does not contain the name of the offending field. > > These deficiencies can mask errors made by the user when defining > a default value. This is important to our application. > > To compensate for these deficiencies we implemented our own checking that > is more strict than Avro's. To do this, we serialize the default value > using our own JSON serializer in a special mode where default values are > applied. Any errors during serialization indicate that the default value > is invalid. > > Something similar might be done in Avro itself, for example, if the JSON > encoder were made to operate in a special mode where default values are > applied. > > --mark >
+
Doug Cutting 2012-11-05, 18:46
-
Re: Schema validation of a field's default values
Mark Hayes 2012-11-05, 21:44
I'm not sure how it can be added to Avro without breaking existing apps. If ResolvingGrammarDecoder.encode were changed to correct the deficiencies I mentioned, existing schemas that don't pass the stricter rules would cause errors in the builder, and errors when applying default values from a reader schema during deserialization.
--mark
+
Mark Hayes 2012-11-05, 21:44
-
Re: Schema validation of a field's default values
Doug Cutting 2012-11-05, 22:01
We should probably follow Postel's law and be liberal in what we accept during deserialization. So we might more rigorously check default values when writing even though that's not when they're used. Most schemas that are used when writing are also used for reading. On Mon, Nov 5, 2012 at 1:44 PM, Mark Hayes <[EMAIL PROTECTED]> wrote:
> I'm not sure how it can be added to Avro without breaking existing apps. > If ResolvingGrammarDecoder.encode were changed to correct the deficiencies > I mentioned, existing schemas that don't pass the stricter rules would > cause errors in the builder, and errors when applying default values from a > reader schema during deserialization. > > --mark >
+
Doug Cutting 2012-11-05, 22:01
|
|