Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Versioning Schema's


Copy link to this message
-
Re: Versioning Schema's
I've done this in the past, and it worked out well. Stored Avro schema
in ZooKeeper with an integer id and prefixed each message with the id.
You have to make sure when you register a new schema that it resolves
with the current version (ResolvingDecoder helps with this).

-David

On 6/13/13 4:07 AM, Shone Sadler wrote:
> Thanks Jun & Phil!
>
> Shone
>
>
> On Thu, Jun 13, 2013 at 12:00 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
>> Yes, we just have customized encoder that encodes the first 4 bytes of md5
>> of the schema, followed by Avro bytes.
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler <[EMAIL PROTECTED]
>>> wrote:
>>> Jun,
>>> I like the idea of an explicit version field, if the schema can be
>> derived
>>> from the topic name itself. The storage (say 1-4 bytes) would require
>> less
>>> overhead than a 128 bit md5 at the added cost of managing the version#.
>>>
>>> Is it correct to assume that your applications are using two schemas
>> then,
>>> one system level schema to deserialize the schema id and bytes for the
>>> application message and a second schema to deserialize those bytes with
>> the
>>> application schema?
>>>
>>> Thanks again!
>>> Shone
>>>
>>>
>>> On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>>
>>>> Actually, currently our schema id is the md5 of the schema itself. Not
>>>> fully sure how this compares with an explicit version field in the
>>> schema.
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>>
>>>> On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> At LinkedIn, we are using option 2.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jun
>>>>>
>>>>>
>>>>> On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler <
>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> After doing some searching on the mailing list for best practices on
>>>>>> integrating Avro with Kafka there appears to be at least 3 options
>> for
>>>>>> integrating the Avro Schema; 1) embedding the entire schema within
>> the
>>>>>> message 2) embedding a unique identifier for the schema in the
>> message
>>>> and
>>>>>> 3) deriving the schema from the topic/resource name.
>>>>>>
>>>>>> Option 2, appears to be the best option in terms of both efficiency
>>> and
>>>>>> flexibility.  However, from a programming perspective it complicates
>>> the
>>>>>> solution with the need for both an envelope schema (containing a
>>> "schema
>>>>>> id" and "bytes" field for record data) and message schema
>> (containing
>>>> the
>>>>>> application specific message fields).  This requires two levels of
>>>>>> serialization/deserialization.
>>>>>> Questions:
>>>>>> 1) How are others dealing with versioning of schemas?
>>>>>> 2) Is there a more elegant means of embedding a schema ids in a Avro
>>>>>> message (I am new to both currently ;-)?
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>> Shone
>>>>>>
>>>>>