Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Primitive type aliases


Copy link to this message
-
Re: Primitive type aliases
Jay Hacker 2013-04-15, 16:03
Doug, thanks for your reply.  Inlining a single-field record seems like a
bit of a heavyweight solution, even if there is no serialization overhead.
I wouldn't want to complicate the spec with this special case.
Additionally, my goal is to simplify my schema definition, and this would
be moving in the opposite direction.

Right now, I just have an extra metadata key in my field definition of
"xtype": "date", which I use as my own representation hint.  I feel this
conveys the intent of the schema much more clearly than an inline record
would, and is also much more compact.  I'm trying to get rid of even that,
so I can just say "type": "date".  I distribute my schema to multiple
parties who need to code to it, and ease of reading and understanding the
schema is important.

Would it be difficult to add built-in types to the list of things you can
alias?  This also does not seem to need a schema language change -- perhaps
just a spec clarification ;) -- and would yield simpler schemas.
On Fri, Apr 12, 2013 at 5:35 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> % grep aliases lang/py/src/avro/*.py
> %
>
> I don't see any support for aliases in Python.
>
> Doug
>
>
> On Fri, Apr 12, 2013 at 2:22 PM, Jeremy Kahn <[EMAIL PROTECTED]> wrote:
>
>> This annotation behavior would be very useful for representing things
>> like "age" (a non-negative number), URI (constrained subset of "string")
>> etc.
>>
>> Doug, when you say "Python doesn't support aliases", what do you mean?
>> What behavior should it support? I understood aliases to be only used in
>> schema evolution, and the Python avro libraries seem to correctly respect
>> aliases when reading from another schema... or don't they?
>>
>> --Jeremy
>>
>>
>>
>>
>> On Fri, Apr 12, 2013 at 2:09 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>>
>>> Aliases are used for for type names (records, enums, & fixed) and field
>>> names.  Also, I don't think aliases are implemented in Python.
>>>
>>> You could define a Date record with a single string and use it.  Records
>>> have no storage overhead, so this will result in the same serialized form
>>> as a string field.  If you don't want the nested structure in memory, then
>>> perhaps we should consider an "inline" schema annotation.  This might look
>>> like:
>>>
>>> {"type":"record", "name":"Date", "inline":true,
>>> "fields":[{"name":"value", "type":"string"}]}
>>> {"type":"record", "name":"Test", "fields":[{"name":"date",
>>> "type":"Date"}]}
>>>
>>> Then the Python implementation might be altered so that when it reads an
>>> inline record with a single field then it returns the value of that single
>>> field, and similarly accepts a value of the field on write.  This would be
>>> a representation-hint to the runtime, and would not affect the schema
>>> language or serialization so should be completely compatible.
>>>
>>> Thoughts?
>>>
>>> Doug
>>>
>>>
>>>
>>> On Fri, Apr 12, 2013 at 12:15 PM, Jay Hacker <[EMAIL PROTECTED]>wrote:
>>>
>>>> I'd like to be able to alias primitive types, for example to indicate
>>>> that a field of type "date" is really a string that I should treat
>>>> specially.  The spec says "Named types and fields may have aliases," which
>>>> suggests it ought to work ("string" is a named type...).
>>>>
>>>> I don't really know how to express an alias for a primitive, but things
>>>> like this:
>>>>
>>>> {
>>>>     "type": "record",
>>>>     "name": "alias-test",
>>>>     "fields": [
>>>>         {"name": "start", "type": {"type": "string", "aliases":
>>>> ["date"]}},
>>>>         {"name": "end",   "type": "date"}
>>>>     ]
>>>> }
>>>>
>>>> don't work (at least not in the Python 1.7.4 implementation: 'Type
>>>> property "date" not a valid Avro schema').  How can I alias a primitive
>>>> type, and if not, why not?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>
>>
>