Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Is Avro right for me?


+
Mark 2013-05-23, 20:29
+
Sean Busbey 2013-05-24, 04:16
+
Mark 2013-05-26, 16:39
+
Martin Kleppmann 2013-05-27, 13:25
+
Russell Jurney 2013-05-27, 18:08
+
Stefan Krawczyk 2013-05-27, 19:00
+
Martin Kleppmann 2013-05-27, 19:34
Copy link to this message
-
Re: Is Avro right for me?
Mark 2013-05-28, 22:38
Thanks for all of the information.

I actually looked into Kafka quite some time ago and I think we passed on it because it didn't have much ruby support (That may have changed by now).
On May 27, 2013, at 12:34 PM, Martin Kleppmann <[EMAIL PROTECTED]> wrote:

> On 27 May 2013 20:00, Stefan Krawczyk <[EMAIL PROTECTED]> wrote:
> So it's up to you what you stick into the body of that Avro event. It could just be json, or it could be your own serialized Avro event - and as far as I understand serialized Avro always has the schema with it (right?).
>
> In an Avro data file, yes, because you just need to specify the schema once, followed by (say) a million records that all use the same schema. And in an RPC context, you can negotiate the schema once per connection. But when using a message broker, you're serializing individual records and don't have an end-to-end connection with the consumer, so you'd need to include the schema with every single message.
>
> It probably doesn't make sense to include the full schema with every one, as a typical schema might be 2 kB whereas a serialized record might be less than 100 bytes (numbers obviously vary wildly by application), so the schema size would dominate. Hence my suggestion of including a schema version number or hash with every message.
>
> Be aware that Flume doesn't have great support for languages outside of the JVM.
>
> The same caveat unfortunately applies with Kafka too. There are clients for non-JVM languages, but they lack important features, so I would recommend using the official JVM client (if your application is non-JVM you could simply pipe your application's stdout into the Kafka producer, or vice versa on the consumer side).
>
> Martin
>

+
Martin Kleppmann 2013-05-29, 10:16
+
Mike Percy 2013-05-29, 00:02
+
Mark 2013-05-29, 16:30
+
Mike Percy 2013-05-30, 03:02
+
Mark 2013-06-05, 03:10
+
Felix GV 2013-06-06, 18:51
+
Felix GV 2013-06-06, 19:09
+
Mark 2013-06-04, 19:57