Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> How is a union of multiple primitives handled?

Jonathan Coveney 2013-04-05, 16:11
Curt Hagenlocher 2013-04-05, 16:31
Copy link to this message
Re: How is a union of multiple primitives handled?
Java has its own issues in this regard, which is that when deserializing a
JSON String, if there is a union in the Schema then you have to give it
{"<type>": <data>} which seems wrong to me (see the other email thread I
started). I asked this question to understand how it should work in python,
but also to get a sense of what the fix should be. I have made a patch that
works according to my understanding, but I still am unsure if that
understanding is correct, as well as if the Java treatment of unions in
this case is correct (to me it seems needlessly cumbersome).

Thanks for your help
2013/4/5 Curt Hagenlocher <[EMAIL PROTECTED]>

> This is a Python-specific issue, and results from the interplay of two
> implementation-specific features:
> 1) Python ints, longs and floats can all legally be serialized as an Avro
> double (or float). See io.py, line 118.
> 2) The union serializer picks the first type that allows legal
> serialization.
> I would be surprised if you got the same thing in Java; it's not the kind
> of behavior I would expect from a statically-typed language.
> On Fri, Apr 5, 2013 at 9:11 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>> The following gist illustrates my question:
>> https://gist.github.com/jcoveney/5320422
>> It seems pretty surprising to me that all of these cases all return 1.0,
>> at least in python (I will now do this in Java, it's just more verbose). Is
>> this an issue with python? Is this an issue period? Is this unexpected?
>> At the very least, if you write 1 to ["int", "double"] you'd expect that
>> it'd get serialized as an int? Or is there a set of rules governing which
>> primitive type to choose? Is it implementation dependent?
>> Also, the case where it throws an error, then returns 0 seems completely
>> wrong. Why would it do that at all? Is it that once it throws an error, it
>> gets into an inconsistent state and nothing is guaranteed?
>> Thanks for helping me understand this!
Pankaj Shroff 2013-04-16, 18:07