-Re: Kafka/Hadoop consumers and producers
Felix GV 2013-07-04, 17:48
The advantages of Camus compared to the contrib consumer are the following
(but perhaps I'm forgetting some) :
- The ability to fetch all/many topics in one job (Map Reduce can
otherwise introduce a lot of overhead for small topics).
- Smarter load balancing of topic partitions across tasks.
- Built-in error detection and logging.
- Support for speculative execution.
- Automatic and complete handling of incremental imports (the contribs
need a bit of hand holding).
- Various configuration parameters for bucket sizes, etc.
- Automatic discovery of new topics (if you use the external avro schema
- Automatic reporting of metrics (if you use Kafka Audit).
However, Camus is currently pretty coupled with avro, and to a lesser
extent with certain conventions within avro schemas, whereas the contrib is
pretty much raw.
Hopefully, that answers your question (?)
On Wed, Jul 3, 2013 at 4:20 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
> What is the difference between this project and Camus? Which advantages to
> use for loading log entries from kafka into Hadoop ?
> Sent from my iPhone
> On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> > We currently have a contrib package for consuming and producing messages
> > from mapreduce (
> > ).
> > We keep running into problems (e.g. KAFKA-946) that are basically due to
> > the fact that the Kafka committers don't seem to mostly be Hadoop
> > developers and aren't doing a good job of maintaining this code (keeping
> > tested, improving it, documenting it, writing tutorials, getting it moved
> > over to the more modern apis, getting it working with newer Hadoop
> > versions, etc).
> > A couple of options:
> > 1. We could try to get someone in the Kafka community (either a current
> > committer or not) who would adopt this as their baby (it's not much
> > 2. We could just let Camus take over this functionality. They already
> > a more sophisticated consumer and the producer is pretty minimal.
> > So are there any people who would like to adopt the current Hadoop
> > code?
> > Conversely would it be possible to provide the same or similar
> > functionality in Camus and just delete these?
> > -Jay