Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Limitations in enum symbols not mentioned in the spec


Copy link to this message
-
Re: Limitations in enum symbols not mentioned in the spec
Francis Galiegue 2013-02-27, 16:27
On Wed, Feb 27, 2013 at 5:24 PM, Francis Galiegue <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have tried to parse this schema:
>
> {
>     "name": "gender",
>     "type": "enum",
>     "symbols": [ "MALE", "FEMALE", "WHO CARES?" ]
> }
>
> But the parser complains about an illegal character in the third symbol.
>
> The problem is, nothing in the spec as far as I can see says that the
> set of usable code points in a symbol is limited at all...
>
> So, what is this allowed set of code points?
>
> --
> Francis Galiegue, [EMAIL PROTECTED]
> JSON Schema in Java: http://json-schema-validator.herokuapp.com

OK, beginning of answer to self:

    if (!(Character.isLetter(first) || first == '_'))
      throw new SchemaParseException("Illegal initial character: "+name);
    for (int i = 1; i < length; i++) {
      char c = name.charAt(i);
      if (!(Character.isLetterOrDigit(c) || c == '_'))
        throw new SchemaParseException("Illegal character in: "+name);

It therefore means any unicode letter or digit, or the underscore, is
allowed anywhere, except at the first point where there must not be an
underscore. So, it means the following is legal:

[ "mémé", "dans", "les" "orties" ]

Right?

--
Francis Galiegue, [EMAIL PROTECTED]
JSON Schema in Java: http://json-schema-validator.herokuapp.com