Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Is Avro right for me?

Copy link to this message
Re: Is Avro right for me?
I don't have experience with Flume, so I can't comment on that. At LinkedIn
we ship logs around by sending Avro-encoded messages to Kafka (
http://kafka.apache.org/). Kafka is nice, it scales very well and gives a
great deal of flexibility — logs can be consumed by any number of
independent consumers, consumers can catch up on a backlog if they're
disconnected for a while, and it comes with Hadoop import out of the box.

(RabbitMQ is more designed or use cases where each message corresponds to a
task that needs to be performed by a worker. IMHO Kafka is a better fit for
logs, which are more stream-like.)

With any message broker, you'll need to somehow tag each message with the
schema that was used to encode it. You could include the full schema with
every message, but unless you have very large messages, that would be a
huge overhead. Better to give each version of your schema a sequential
version number, or hash the schema, and include the version number/hash in
each message. You can then keep a repository of schemas for resolving those
version numbers or hashes – simply in files that you distribute to all
producers/consumers, or in a simple REST service like

Hope that helps,
On 26 May 2013 17:39, Mark <[EMAIL PROTECTED]> wrote:

> Yes our central server would be Hadoop.
> Exactly how would this work with flume? Would I write Avro to a file
> source which flume would then ship over to one of our collectors  or is
> there a better/native way? Would I have to include the schema in each
> event? FYI we would be doing this primarily from a rails application.
> Does anyone ever use Avro with a message bus like RabbitMQ?
> On May 23, 2013, at 9:16 PM, Sean Busbey <[EMAIL PROTECTED]> wrote:
> Yep. Avro would be great at that (provided your central consumer is Avro
> friendly, like a Hadoop system).  Make sure that all of your schemas have
> default values defined for fields so that schema evolution will be easier
> in the future.
> On Thu, May 23, 2013 at 4:29 PM, Mark <[EMAIL PROTECTED]> wrote:
>> We're thinking about generating logs and events with Avro and shipping
>> them to a central collector service via Flume. Is this a valid use case?
> --
> Sean