We define the LinkedIn Kafka message to have a magic byte (indicating Avro serialization), MD5 header followed by the payload. The Hadoop consumer reads the MD5, looks up the schema in the repository and deserializes the message.
Thanks, Neha On Wed, Aug 21, 2013 at 8:15 PM, Mark <[EMAIL PROTECTED]> wrote:
So the only point of the magic byte is to indicate that the rest of the message is Avro encoded? I noticed that in Camus a 4 byte int id of the schema is written instead of the 16 byte SHA. Is this the new preferred way? Which is compatible with https://issues.apache.org/jira/browse/AVRO-1124?
On Aug 21, 2013, at 8:38 PM, Neha Narkhede <[EMAIL PROTECTED]> wrote:
The point of the magic byte is to indicate the current version of the message format. One part of the format is the fact that it is Avro encoded. I'm not sure how Camus gets a 4 byte id, but at LinkedIn we use the 16 byte MD5 hash of the schema. Since AVRO-1124 is not resolved yet, I'm not sure if I can comment on the compatibility just yet.
Thanks, Neha On Wed, Aug 21, 2013 at 9:00 PM, Mark <[EMAIL PROTECTED]> wrote:
… or is the payload of the message prepending with a magic byte followed by the SHA?
On Aug 22, 2013, at 9:49 AM, Mark <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext