Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Union resolution in dynamic languages


Copy link to this message
-
Union resolution in dynamic languages
For encoding data of union type, the Avro specification do not say a lot
which one of the type in the union is used. So far I am mostly using
union so that I can write null or another simple type. In these cases,
it is fairly obvious for the encoding to distinguish null from other types.

However a union can also be any named types. So they can be two records.
Let say a Manger record and a NonManager record. I think with strongly
typed languages, the suitable type in the union can be selected by
introspection. But for dynamic languages, these might just be a
represented as maps without any notion of type. In some case, we may
find that the object has all the attributes of a NonManager but not the
Manager. So we can conclude NonManager is the proper schema to use. But
this can get complicated with nested data structure where the attribute
that can disambiguate thing appear in a deeper level. Or you can think
of valid scenario where inspecting the content of the obj cannot
unambiguously resolve the union branch.

I notice that the Python implementation use two pass recursive
validation possible for the reason of for resolving the union choice.

I am wonder if there are much consideration about are potentially
complex, indirectly nested union types that might be difficult to
resolve? Thus adding complexity to the implementation of the encoders?
Are there use case in practice that involve complex union decision?

Wai Yip