Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Promote to string


Copy link to this message
-
Re: Promote to string
Scott,

Thanks for your response.

I completely agree that your use cases are valuable.  I think we also agree
that the right place to layer this is as a separate "translation" or
"transformation" library.  I think madness lies in pushing those
transformations into schema JSON; that's not what you're proposing, however,
so all is good.

-- Philip

On Thu, Feb 10, 2011 at 10:28 AM, Scott Carey <[EMAIL PROTECTED]>wrote:

>
>
> On 2/4/11 1:16 PM, "Philip Zeyliger" <[EMAIL PROTECTED]> wrote:
>
> >On Fri, Feb 4, 2011 at 10:02 AM, Scott Carey
> ><[EMAIL PROTECTED]>wrote:
> >
> >> I have been thinking about more advanced type promotion in Avro after
> >> facing more complicated schema evolution issues.
> >
> >
> >My two cents:
> >
> >This way lies madness.  Avro (and PB and Thrift) give you some basic tools
> >to evolve an API without doing much extra code.  At some point, you end up
> >forking and creating an APIv2, and eventually deprecate APIv1.  If you try
> >to make that magical, you'll end up building a programming language.
>
> I agree that protocol API versus AVIv2 is an example where exotic
> conversions don't make a lot of sense.  The schemas in a protocol API
> isn't persisted long term, it is only on the wire.
>
> My use cases are in long term persisted file data, where schema evolution
> spans a much longer time window (forever unless I can re-write all data).
> Having  File format v1 not being compatible with file format v2 is a lot
> harder to swallow than API v2 not being compatible with API v2.
>
> I have another use case in mind as well.  Schema transformation is a
> common need for interoperation with other frameworks. Cascading doesn't
> support nested records (or it didn't last I looked), so a Cascading Tap
> has to either flatten them or not support them.  Pig doesn't support
> unions, so they are either not supported, or manipulated into non-union
> structures.  Schema transformation is a common use case when integrating
> Avro with pre-existing systems.
> When working on Pig and Hive adapter prototypes, there turned out to be a
> lot of overlap and repeated work -- and its almost all in schema
> transformation (flattening, unions, etc), classification (recursive?), and
> translation.
> If there was a general helper library for this sort of work, then the
> remaining adapter would be rather small and not require so much Avro
> domain knowledge.
>
>
> >
> >By all means define a language that converts from one Avro record into
> >another.  An Avro expression language would be quite useful, actually.
> >Putting it in the core, however, strikes me as feature creep.
>
> Core should definitely remain simple.  Anything like this should be an
> optional library.  Support for each transformation should be optional as
> well -- many languages might have string <> int, while only a couple have
> union branch materialization.
>
> The more complicated transforms are mostly useful for frameworks that want
> to use Avro in a way that can interop with other frameworks using avro.
>
> The initial reaction to the above statement is probably, "If they are both
> using Avro already, shouldn't they automatically be able to share data?"
> The answer is no.  They aren't using Avro as their internal schema system.
>  They are _translating_ between their internal schema system and Avro,
> potentially applying various transformation rules.  So, for the lowest
> common denominator supported schemas, it works fine, anything more
> complicated and it won't.  This is not a fault of Avro, it is the nature
> of compatibility between two non-Avro schema systems.
> Hive supports Maps with integers as keys.  Pig does not.  These can be
> made to interop through Avro if both systems share their schema
> translation techniques, but not otherwise.
>
> >
> >-- Philip
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB