|
|
-
Limitations in enum symbols not mentioned in the spec
Francis Galiegue 2013-02-27, 16:24
Hello, I have tried to parse this schema: { "name": "gender", "type": "enum", "symbols": [ "MALE", "FEMALE", "WHO CARES?" ] } But the parser complains about an illegal character in the third symbol. The problem is, nothing in the spec as far as I can see says that the set of usable code points in a symbol is limited at all... So, what is this allowed set of code points? -- Francis Galiegue, [EMAIL PROTECTED] JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Francis Galiegue 2013-02-27, 16:27
On Wed, Feb 27, 2013 at 5:24 PM, Francis Galiegue <[EMAIL PROTECTED]> wrote: > Hello, > > I have tried to parse this schema: > > { > "name": "gender", > "type": "enum", > "symbols": [ "MALE", "FEMALE", "WHO CARES?" ] > } > > But the parser complains about an illegal character in the third symbol. > > The problem is, nothing in the spec as far as I can see says that the > set of usable code points in a symbol is limited at all... > > So, what is this allowed set of code points? > > -- > Francis Galiegue, [EMAIL PROTECTED] > JSON Schema in Java: http://json-schema-validator.herokuapp.comOK, beginning of answer to self: if (!(Character.isLetter(first) || first == '_')) throw new SchemaParseException("Illegal initial character: "+name); for (int i = 1; i < length; i++) { char c = name.charAt(i); if (!(Character.isLetterOrDigit(c) || c == '_')) throw new SchemaParseException("Illegal character in: "+name); It therefore means any unicode letter or digit, or the underscore, is allowed anywhere, except at the first point where there must not be an underscore. So, it means the following is legal: [ "mémé", "dans", "les" "orties" ] Right? -- Francis Galiegue, [EMAIL PROTECTED] JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Doug Cutting 2013-02-27, 17:55
The specification is more restrictive, it says: The name portion of a fullname, record field names, and enum symbols must: - start with [A-Za-z_] - subsequently contain only [A-Za-z0-9_] The Java implementation is more liberal in what it accepts. This is discussed in https://issues.apache.org/jira/browse/AVRO-1022. Doug On Wed, Feb 27, 2013 at 8:27 AM, Francis Galiegue <[EMAIL PROTECTED]> wrote: > On Wed, Feb 27, 2013 at 5:24 PM, Francis Galiegue <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I have tried to parse this schema: >> >> { >> "name": "gender", >> "type": "enum", >> "symbols": [ "MALE", "FEMALE", "WHO CARES?" ] >> } >> >> But the parser complains about an illegal character in the third symbol. >> >> The problem is, nothing in the spec as far as I can see says that the >> set of usable code points in a symbol is limited at all... >> >> So, what is this allowed set of code points? >> >> -- >> Francis Galiegue, [EMAIL PROTECTED] >> JSON Schema in Java: http://json-schema-validator.herokuapp.com> > OK, beginning of answer to self: > > if (!(Character.isLetter(first) || first == '_')) > throw new SchemaParseException("Illegal initial character: "+name); > for (int i = 1; i < length; i++) { > char c = name.charAt(i); > if (!(Character.isLetterOrDigit(c) || c == '_')) > throw new SchemaParseException("Illegal character in: "+name); > > It therefore means any unicode letter or digit, or the underscore, is > allowed anywhere, except at the first point where there must not be an > underscore. So, it means the following is legal: > > [ "mémé", "dans", "les" "orties" ] > > Right? > > -- > Francis Galiegue, [EMAIL PROTECTED] > JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Francis Galiegue 2013-02-27, 18:12
On Wed, Feb 27, 2013 at 6:55 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > The specification is more restrictive, it says: > > The name portion of a fullname, record field names, and enum symbols must: > - start with [A-Za-z_] > - subsequently contain only [A-Za-z0-9_] > > The Java implementation is more liberal in what it accepts. > > This is discussed in https://issues.apache.org/jira/browse/AVRO-1022. > Argh! OK, I have misread that part, I thought it only applied to names... Sorry for the noise :/ One additional question: it applies to record field names, but _not_ map keys? Curious... -- Francis Galiegue, [EMAIL PROTECTED] JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Doug Cutting 2013-02-27, 18:40
The restrictions on names are primarily to facilitate translation into programming languages. Map keys are user data, not part of a schema that might be so translated. We restricted map keys to strings, since the standard map implementations in some programming languages don't permit arbitrary types in keys. Doug On Wed, Feb 27, 2013 at 10:12 AM, Francis Galiegue <[EMAIL PROTECTED]> wrote: > On Wed, Feb 27, 2013 at 6:55 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> The specification is more restrictive, it says: >> >> The name portion of a fullname, record field names, and enum symbols must: >> - start with [A-Za-z_] >> - subsequently contain only [A-Za-z0-9_] >> >> The Java implementation is more liberal in what it accepts. >> >> This is discussed in https://issues.apache.org/jira/browse/AVRO-1022. >> > > Argh! OK, I have misread that part, I thought it only applied to names... > > Sorry for the noise :/ > > One additional question: it applies to record field names, but _not_ > map keys? Curious... > > -- > Francis Galiegue, [EMAIL PROTECTED] > JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Francis Galiegue 2013-02-27, 18:43
On Wed, Feb 27, 2013 at 7:40 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > The restrictions on names are primarily to facilitate translation into > programming languages. Map keys are user data, not part of a schema > that might be so translated. We restricted map keys to strings, since > the standard map implementations in some programming languages don't > permit arbitrary types in keys. > I was talking about what is allowed in map keys, not values -- if map keys were able to be anything other than strings, Avro could not be mapped to JSON. -- Francis Galiegue, [EMAIL PROTECTED] JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Tatu Saloranta 2013-02-27, 18:53
On Wed, Feb 27, 2013 at 10:43 AM, Francis Galiegue <[EMAIL PROTECTED]> wrote: > On Wed, Feb 27, 2013 at 7:40 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> The restrictions on names are primarily to facilitate translation into >> programming languages. Map keys are user data, not part of a schema >> that might be so translated. We restricted map keys to strings, since >> the standard map implementations in some programming languages don't >> permit arbitrary types in keys. >> > > I was talking about what is allowed in map keys, not values -- if map > keys were able to be anything other than strings, Avro could not be > mapped to JSON.
JSON does not limit keys in any ways, so this is not true. But maybe you mean that Avro could not be mapped to JSON constrained by a specific kind of JSON Schema. I think Avro actually handles this part in more pragmatic terms than JSON Schema, since many programming languages have clear distinction between typed objects (in Java, POJOs) and untyped "hash table" like structure (Maps).
-+ Tatu +-
-
Re: Limitations in enum symbols not mentioned in the spec
Doug Cutting 2013-02-27, 19:00
On Wed, Feb 27, 2013 at 10:43 AM, Francis Galiegue <[EMAIL PROTECTED]> wrote: > I was talking about what is allowed in map keys, not values
So was I.
> if map > keys were able to be anything other than strings, Avro could not be > mapped to JSON.
Yes, JavaScript is an example of a programming language whose standard map implementation doesn't provide support for non-string keys. JavaScript however is not an example of a programming language whose identifiers permit a only a limited set of characters, since (I believe) one can escape arbitrary characters in property accessors when using the dot syntax.
Doug
-
Re: Limitations in enum symbols not mentioned in the spec
Francis Galiegue 2013-02-27, 19:10
On Wed, Feb 27, 2013 at 7:53 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: [...] >> >> I was talking about what is allowed in map keys, not values -- if map >> keys were able to be anything other than strings, Avro could not be >> mapped to JSON. > > JSON does not limit keys in any ways, so this is not true. I was talking about member names, not member values ;) > But maybe you mean that Avro could not be mapped to JSON constrained > by a specific kind of JSON Schema. > I think Avro actually handles this part in more pragmatic terms than > JSON Schema, since many programming languages have clear distinction > between typed objects (in Java, POJOs) and untyped "hash table" like > structure (Maps). > Yes, that is true, but on the other hand not all data is destined to be mapped to POJOs ;) -- Francis Galiegue, [EMAIL PROTECTED] JSON Schema in Java: http://json-schema-validator.herokuapp.com
-
Re: Limitations in enum symbols not mentioned in the spec
Tatu Saloranta 2013-02-28, 06:41
On Wed, Feb 27, 2013 at 11:10 AM, Francis Galiegue <[EMAIL PROTECTED]> wrote: > On Wed, Feb 27, 2013 at 7:53 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: > [...] >>> >>> I was talking about what is allowed in map keys, not values -- if map >>> keys were able to be anything other than strings, Avro could not be >>> mapped to JSON. >> >> JSON does not limit keys in any ways, so this is not true. > > I was talking about member names, not member values ;)
Ah. I misread your statement there, and (wrongly) thought you meant that key value domains were limited similar to property names. But you just meant that non-String values are not allowed.
-+ Tatu +-
|
|