Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: Kafka/Hadoop consumers and producers


Copy link to this message
-
Re: Kafka/Hadoop consumers and producers
Vadim,

The advantages of Camus compared to the contrib consumer are the following
(but perhaps I'm forgetting some) :

   - The ability to fetch all/many topics in one job (Map Reduce can
   otherwise introduce a lot of overhead for small topics).
   - Smarter load balancing of topic partitions across tasks.
   - Built-in error detection and logging.
   - Support for speculative execution.
   - Automatic and complete handling of incremental imports (the contribs
   need a bit of hand holding).
   - Various configuration parameters for bucket sizes, etc.
   - Automatic discovery of new topics (if you use the external avro schema
   repo).
   - Automatic reporting of metrics (if you use Kafka Audit).

However, Camus is currently pretty coupled with avro, and to a lesser
extent with certain conventions within avro schemas, whereas the contrib is
pretty much raw.

Hopefully, that answers your question (?)

--
Felix
On Wed, Jul 3, 2013 at 4:20 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:

> Jay,
> What is the difference between this project and Camus? Which advantages to
> use for loading log entries from kafka into Hadoop ?
>
> Vadim
>
> Sent from my iPhone
>
> On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > We currently have a contrib package for consuming and producing messages
> > from mapreduce (
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e53e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD
> > ).
> >
> > We keep running into problems (e.g. KAFKA-946) that are basically due to
> > the fact that the Kafka committers don't seem to mostly be Hadoop
> > developers and aren't doing a good job of maintaining this code (keeping
> it
> > tested, improving it, documenting it, writing tutorials, getting it moved
> > over to the more modern apis, getting it working with newer Hadoop
> > versions, etc).
> >
> > A couple of options:
> > 1. We could try to get someone in the Kafka community (either a current
> > committer or not) who would adopt this as their baby (it's not much
> code).
> > 2. We could just let Camus take over this functionality. They already
> have
> > a more sophisticated consumer and the producer is pretty minimal.
> >
> > So are there any people who would like to adopt the current Hadoop
> contrib
> > code?
> >
> > Conversely would it be possible to provide the same or similar
> > functionality in Camus and just delete these?
> >
> > -Jay
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB