Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Re: Kafka/Hadoop consumers and producers


Copy link to this message
-
Re: Kafka/Hadoop consumers and producers
Felix GV 2013-07-04, 17:48
Vadim,

The advantages of Camus compared to the contrib consumer are the following
(but perhaps I'm forgetting some) :

   - The ability to fetch all/many topics in one job (Map Reduce can
   otherwise introduce a lot of overhead for small topics).
   - Smarter load balancing of topic partitions across tasks.
   - Built-in error detection and logging.
   - Support for speculative execution.
   - Automatic and complete handling of incremental imports (the contribs
   need a bit of hand holding).
   - Various configuration parameters for bucket sizes, etc.
   - Automatic discovery of new topics (if you use the external avro schema
   repo).
   - Automatic reporting of metrics (if you use Kafka Audit).

However, Camus is currently pretty coupled with avro, and to a lesser
extent with certain conventions within avro schemas, whereas the contrib is
pretty much raw.

Hopefully, that answers your question (?)

--
Felix
On Wed, Jul 3, 2013 at 4:20 AM, Vadim Keylis <[EMAIL PROTECTED]> wrote:

> Jay,
> What is the difference between this project and Camus? Which advantages to
> use for loading log entries from kafka into Hadoop ?
>
> Vadim
>
> Sent from my iPhone
>
> On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > We currently have a contrib package for consuming and producing messages
> > from mapreduce (
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e53e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD
> > ).
> >
> > We keep running into problems (e.g. KAFKA-946) that are basically due to
> > the fact that the Kafka committers don't seem to mostly be Hadoop
> > developers and aren't doing a good job of maintaining this code (keeping
> it
> > tested, improving it, documenting it, writing tutorials, getting it moved
> > over to the more modern apis, getting it working with newer Hadoop
> > versions, etc).
> >
> > A couple of options:
> > 1. We could try to get someone in the Kafka community (either a current
> > committer or not) who would adopt this as their baby (it's not much
> code).
> > 2. We could just let Camus take over this functionality. They already
> have
> > a more sophisticated consumer and the producer is pretty minimal.
> >
> > So are there any people who would like to adopt the current Hadoop
> contrib
> > code?
> >
> > Conversely would it be possible to provide the same or similar
> > functionality in Camus and just delete these?
> >
> > -Jay
>