Thanks for your response.
I completely agree that your use cases are valuable. I think we also agree
that the right place to layer this is as a separate "translation" or
"transformation" library. I think madness lies in pushing those
transformations into schema JSON; that's not what you're proposing, however,
so all is good.
On Thu, Feb 10, 2011 at 10:28 AM, Scott Carey <[EMAIL PROTECTED]>wrote:
> On 2/4/11 1:16 PM, "Philip Zeyliger" <[EMAIL PROTECTED]> wrote:
> >On Fri, Feb 4, 2011 at 10:02 AM, Scott Carey
> ><[EMAIL PROTECTED]>wrote:
> >> I have been thinking about more advanced type promotion in Avro after
> >> facing more complicated schema evolution issues.
> >My two cents:
> >This way lies madness. Avro (and PB and Thrift) give you some basic tools
> >to evolve an API without doing much extra code. At some point, you end up
> >forking and creating an APIv2, and eventually deprecate APIv1. If you try
> >to make that magical, you'll end up building a programming language.
> I agree that protocol API versus AVIv2 is an example where exotic
> conversions don't make a lot of sense. The schemas in a protocol API
> isn't persisted long term, it is only on the wire.
> My use cases are in long term persisted file data, where schema evolution
> spans a much longer time window (forever unless I can re-write all data).
> Having File format v1 not being compatible with file format v2 is a lot
> harder to swallow than API v2 not being compatible with API v2.
> I have another use case in mind as well. Schema transformation is a
> common need for interoperation with other frameworks. Cascading doesn't
> support nested records (or it didn't last I looked), so a Cascading Tap
> has to either flatten them or not support them. Pig doesn't support
> unions, so they are either not supported, or manipulated into non-union
> structures. Schema transformation is a common use case when integrating
> Avro with pre-existing systems.
> When working on Pig and Hive adapter prototypes, there turned out to be a
> lot of overlap and repeated work -- and its almost all in schema
> transformation (flattening, unions, etc), classification (recursive?), and
> If there was a general helper library for this sort of work, then the
> remaining adapter would be rather small and not require so much Avro
> domain knowledge.
> >By all means define a language that converts from one Avro record into
> >another. An Avro expression language would be quite useful, actually.
> >Putting it in the core, however, strikes me as feature creep.
> Core should definitely remain simple. Anything like this should be an
> optional library. Support for each transformation should be optional as
> well -- many languages might have string <> int, while only a couple have
> union branch materialization.
> The more complicated transforms are mostly useful for frameworks that want
> to use Avro in a way that can interop with other frameworks using avro.
> The initial reaction to the above statement is probably, "If they are both
> using Avro already, shouldn't they automatically be able to share data?"
> The answer is no. They aren't using Avro as their internal schema system.
> They are _translating_ between their internal schema system and Avro,
> potentially applying various transformation rules. So, for the lowest
> common denominator supported schemas, it works fine, anything more
> complicated and it won't. This is not a fault of Avro, it is the nature
> of compatibility between two non-Avro schema systems.
> Hive supports Maps with integers as keys. Pig does not. These can be
> made to interop through Avro if both systems share their schema
> translation techniques, but not otherwise.
> >-- Philip