Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Is Avro right for me?


+
Mark 2013-05-23, 20:29
+
Sean Busbey 2013-05-24, 04:16
+
Mark 2013-05-26, 16:39
+
Martin Kleppmann 2013-05-27, 13:25
+
Russell Jurney 2013-05-27, 18:08
+
Stefan Krawczyk 2013-05-27, 19:00
+
Martin Kleppmann 2013-05-27, 19:34
Copy link to this message
-
Re: Is Avro right for me?
Thanks for all of the information.

I actually looked into Kafka quite some time ago and I think we passed on it because it didn't have much ruby support (That may have changed by now).
On May 27, 2013, at 12:34 PM, Martin Kleppmann <[EMAIL PROTECTED]> wrote:

> On 27 May 2013 20:00, Stefan Krawczyk <[EMAIL PROTECTED]> wrote:
> So it's up to you what you stick into the body of that Avro event. It could just be json, or it could be your own serialized Avro event - and as far as I understand serialized Avro always has the schema with it (right?).
>
> In an Avro data file, yes, because you just need to specify the schema once, followed by (say) a million records that all use the same schema. And in an RPC context, you can negotiate the schema once per connection. But when using a message broker, you're serializing individual records and don't have an end-to-end connection with the consumer, so you'd need to include the schema with every single message.
>
> It probably doesn't make sense to include the full schema with every one, as a typical schema might be 2 kB whereas a serialized record might be less than 100 bytes (numbers obviously vary wildly by application), so the schema size would dominate. Hence my suggestion of including a schema version number or hash with every message.
>
> Be aware that Flume doesn't have great support for languages outside of the JVM.
>
> The same caveat unfortunately applies with Kafka too. There are clients for non-JVM languages, but they lack important features, so I would recommend using the official JVM client (if your application is non-JVM you could simply pipe your application's stdout into the Kafka producer, or vice versa on the consumer side).
>
> Martin
>

+
Martin Kleppmann 2013-05-29, 10:16
+
Mike Percy 2013-05-29, 00:02
+
Mark 2013-05-29, 16:30
+
Mike Percy 2013-05-30, 03:02
+
Mark 2013-06-05, 03:10
+
Felix GV 2013-06-06, 18:51
+
Felix GV 2013-06-06, 19:09
+
Mark 2013-06-04, 19:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB