We keep running into problems (e.g. KAFKA-946) that are basically due to the fact that the Kafka committers don't seem to mostly be Hadoop developers and aren't doing a good job of maintaining this code (keeping it tested, improving it, documenting it, writing tutorials, getting it moved over to the more modern apis, getting it working with newer Hadoop versions, etc).
A couple of options: 1. We could try to get someone in the Kafka community (either a current committer or not) who would adopt this as their baby (it's not much code). 2. We could just let Camus take over this functionality. They already have a more sophisticated consumer and the producer is pretty minimal.
So are there any people who would like to adopt the current Hadoop contrib code?
Conversely would it be possible to provide the same or similar functionality in Camus and just delete these?
If the Hadoop consumer/producers use-case will remain relevant for Kafka (I assume it will), it would make sense to have the core components (kafka input/output format at least) as part of Kafka so that it could be built, tested and versioned together to maintain compatibility. This would also make it easier to build custom MR jobs on top of Kafka, rather than having to decouple stuff from Camus. Also it would also be less confusing for users at least when starting using Kafka.
Camus could use those instead of providing it's own.
This being said we did some work on the consumer side (0.8 and the new(er) MR API). We could probably try to rewrite them to use Camus or fix Camus or whatever, but please consider this alternative as well.
On 7/3/13 11:06 AM, "Sam Meder" <[EMAIL PROTECTED]> wrote:
I guess I am more concerned about the long term than the short term. I think if you guys want to have all the Hadoop+Kafka stuff then we should move the producer there and it sounds like it would be possible to get similar functionality from the existing consumer code. I am not in a rush I just want to figure out a plan.
The alternative is if there is anyone who is interested in maintaining this stuff in Kafka. The current state where it is poorly documented and maintained is not good.
-Jay On Wed, Jul 3, 2013 at 1:51 PM, Ken Goodhope <[EMAIL PROTECTED]> wrote:
We can easily make a Camus configuration that would mimic the functionality of the hadoop consumer in contrib. It may require the addition of a BinaryWritable decoder, and a couple minor code changes. As for the producer, we don't have anything in Camus that does what it does. But maybe we should at some point. In the meantime, Gaurav is going to take a look at what is in contrib and see if it is easily fixed. I have a feeling it probably will take minimal effort, and allow us to kick the can down the road till we get more time to properly address this.
@Jay, would this work for now?
Ken On Wed, Jul 3, 2013 at 10:57 AM, Felix GV <[EMAIL PROTECTED]> wrote:
The advantages of Camus compared to the contrib consumer are the following (but perhaps I'm forgetting some) :
- The ability to fetch all/many topics in one job (Map Reduce can otherwise introduce a lot of overhead for small topics). - Smarter load balancing of topic partitions across tasks. - Built-in error detection and logging. - Support for speculative execution. - Automatic and complete handling of incremental imports (the contribs need a bit of hand holding). - Various configuration parameters for bucket sizes, etc. - Automatic discovery of new topics (if you use the external avro schema repo). - Automatic reporting of metrics (if you use Kafka Audit).
However, Camus is currently pretty coupled with avro, and to a lesser extent with certain conventions within avro schemas, whereas the contrib is pretty much raw.
Hopefully, that answers your question (?)
Felix On Wed, Jul 3, 2013 at 4:20 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext