Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Versioning Schema's


Copy link to this message
-
Re: Versioning Schema's
Jun Rao 2013-06-13, 04:00
Yes, we just have customized encoder that encodes the first 4 bytes of md5
of the schema, followed by Avro bytes.

Thanks,

Jun
On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler <[EMAIL PROTECTED]>wrote:

> Jun,
> I like the idea of an explicit version field, if the schema can be derived
> from the topic name itself. The storage (say 1-4 bytes) would require less
> overhead than a 128 bit md5 at the added cost of managing the version#.
>
> Is it correct to assume that your applications are using two schemas then,
> one system level schema to deserialize the schema id and bytes for the
> application message and a second schema to deserialize those bytes with the
> application schema?
>
> Thanks again!
> Shone
>
>
> On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Actually, currently our schema id is the md5 of the schema itself. Not
> > fully sure how this compares with an explicit version field in the
> schema.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > At LinkedIn, we are using option 2.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler <[EMAIL PROTECTED]
> > >wrote:
> > >
> > >> Hello everyone,
> > >>
> > >> After doing some searching on the mailing list for best practices on
> > >> integrating Avro with Kafka there appears to be at least 3 options for
> > >> integrating the Avro Schema; 1) embedding the entire schema within the
> > >> message 2) embedding a unique identifier for the schema in the message
> > and
> > >> 3) deriving the schema from the topic/resource name.
> > >>
> > >> Option 2, appears to be the best option in terms of both efficiency
> and
> > >> flexibility.  However, from a programming perspective it complicates
> the
> > >> solution with the need for both an envelope schema (containing a
> "schema
> > >> id" and "bytes" field for record data) and message schema (containing
> > the
> > >> application specific message fields).  This requires two levels of
> > >> serialization/deserialization.
> > >> Questions:
> > >> 1) How are others dealing with versioning of schemas?
> > >> 2) Is there a more elegant means of embedding a schema ids in a Avro
> > >> message (I am new to both currently ;-)?
> > >>
> > >> Thanks in advance!
> > >>
> > >> Shone
> > >>
> > >
> > >
> >
>