Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Is Avro right for me?


+
Mark 2013-05-23, 20:29
+
Sean Busbey 2013-05-24, 04:16
+
Mark 2013-05-26, 16:39
+
Martin Kleppmann 2013-05-27, 13:25
Copy link to this message
-
Re: Is Avro right for me?
Whats more, there are examples and support for Kafka, but not so much for
Flume.
On Mon, May 27, 2013 at 6:25 AM, Martin Kleppmann <[EMAIL PROTECTED]>wrote:

> I don't have experience with Flume, so I can't comment on that. At
> LinkedIn we ship logs around by sending Avro-encoded messages to Kafka (
> http://kafka.apache.org/). Kafka is nice, it scales very well and gives a
> great deal of flexibility — logs can be consumed by any number of
> independent consumers, consumers can catch up on a backlog if they're
> disconnected for a while, and it comes with Hadoop import out of the box.
>
> (RabbitMQ is more designed or use cases where each message corresponds to
> a task that needs to be performed by a worker. IMHO Kafka is a better fit
> for logs, which are more stream-like.)
>
> With any message broker, you'll need to somehow tag each message with the
> schema that was used to encode it. You could include the full schema with
> every message, but unless you have very large messages, that would be a
> huge overhead. Better to give each version of your schema a sequential
> version number, or hash the schema, and include the version number/hash in
> each message. You can then keep a repository of schemas for resolving those
> version numbers or hashes – simply in files that you distribute to all
> producers/consumers, or in a simple REST service like
> https://issues.apache.org/jira/browse/AVRO-1124
>
> Hope that helps,
> Martin
>
>
> On 26 May 2013 17:39, Mark <[EMAIL PROTECTED]> wrote:
>
>> Yes our central server would be Hadoop.
>>
>> Exactly how would this work with flume? Would I write Avro to a file
>> source which flume would then ship over to one of our collectors  or is
>> there a better/native way? Would I have to include the schema in each
>> event? FYI we would be doing this primarily from a rails application.
>>
>> Does anyone ever use Avro with a message bus like RabbitMQ?
>>
>> On May 23, 2013, at 9:16 PM, Sean Busbey <[EMAIL PROTECTED]> wrote:
>>
>> Yep. Avro would be great at that (provided your central consumer is Avro
>> friendly, like a Hadoop system).  Make sure that all of your schemas have
>> default values defined for fields so that schema evolution will be easier
>> in the future.
>>
>>
>> On Thu, May 23, 2013 at 4:29 PM, Mark <[EMAIL PROTECTED]> wrote:
>>
>>> We're thinking about generating logs and events with Avro and shipping
>>> them to a central collector service via Flume. Is this a valid use case?
>>>
>>>
>>
>>
>> --
>> Sean
>>
>>
>>
>
--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
+
Stefan Krawczyk 2013-05-27, 19:00
+
Martin Kleppmann 2013-05-27, 19:34
+
Mark 2013-05-28, 22:38
+
Martin Kleppmann 2013-05-29, 10:16
+
Mike Percy 2013-05-29, 00:02
+
Mark 2013-05-29, 16:30
+
Mike Percy 2013-05-30, 03:02
+
Mark 2013-06-05, 03:10
+
Felix GV 2013-06-06, 18:51
+
Felix GV 2013-06-06, 19:09
+
Mark 2013-06-04, 19:57