Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Promote to string


Copy link to this message
-
Re: Promote to string
Scott,

Thanks for your response.

I completely agree that your use cases are valuable.  I think we also agree
that the right place to layer this is as a separate "translation" or
"transformation" library.  I think madness lies in pushing those
transformations into schema JSON; that's not what you're proposing, however,
so all is good.

-- Philip

On Thu, Feb 10, 2011 at 10:28 AM, Scott Carey <[EMAIL PROTECTED]>wrote:

>
>
> On 2/4/11 1:16 PM, "Philip Zeyliger" <[EMAIL PROTECTED]> wrote:
>
> >On Fri, Feb 4, 2011 at 10:02 AM, Scott Carey
> ><[EMAIL PROTECTED]>wrote:
> >
> >> I have been thinking about more advanced type promotion in Avro after
> >> facing more complicated schema evolution issues.
> >
> >
> >My two cents:
> >
> >This way lies madness.  Avro (and PB and Thrift) give you some basic tools
> >to evolve an API without doing much extra code.  At some point, you end up
> >forking and creating an APIv2, and eventually deprecate APIv1.  If you try
> >to make that magical, you'll end up building a programming language.
>
> I agree that protocol API versus AVIv2 is an example where exotic
> conversions don't make a lot of sense.  The schemas in a protocol API
> isn't persisted long term, it is only on the wire.
>
> My use cases are in long term persisted file data, where schema evolution
> spans a much longer time window (forever unless I can re-write all data).
> Having  File format v1 not being compatible with file format v2 is a lot
> harder to swallow than API v2 not being compatible with API v2.
>
> I have another use case in mind as well.  Schema transformation is a
> common need for interoperation with other frameworks. Cascading doesn't
> support nested records (or it didn't last I looked), so a Cascading Tap
> has to either flatten them or not support them.  Pig doesn't support
> unions, so they are either not supported, or manipulated into non-union
> structures.  Schema transformation is a common use case when integrating
> Avro with pre-existing systems.
> When working on Pig and Hive adapter prototypes, there turned out to be a
> lot of overlap and repeated work -- and its almost all in schema
> transformation (flattening, unions, etc), classification (recursive?), and
> translation.
> If there was a general helper library for this sort of work, then the
> remaining adapter would be rather small and not require so much Avro
> domain knowledge.
>
>
> >
> >By all means define a language that converts from one Avro record into
> >another.  An Avro expression language would be quite useful, actually.
> >Putting it in the core, however, strikes me as feature creep.
>
> Core should definitely remain simple.  Anything like this should be an
> optional library.  Support for each transformation should be optional as
> well -- many languages might have string <> int, while only a couple have
> union branch materialization.
>
> The more complicated transforms are mostly useful for frameworks that want
> to use Avro in a way that can interop with other frameworks using avro.
>
> The initial reaction to the above statement is probably, "If they are both
> using Avro already, shouldn't they automatically be able to share data?"
> The answer is no.  They aren't using Avro as their internal schema system.
>  They are _translating_ between their internal schema system and Avro,
> potentially applying various transformation rules.  So, for the lowest
> common denominator supported schemas, it works fine, anything more
> complicated and it won't.  This is not a fault of Avro, it is the nature
> of compatibility between two non-Avro schema systems.
> Hive supports Maps with integers as keys.  Pig does not.  These can be
> made to interop through Avro if both systems share their schema
> translation techniques, but not otherwise.
>
> >
> >-- Philip
>
>