Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Kafka/Hadoop consumers and producers


Copy link to this message
-
Re: Kafka/Hadoop consumers and producers
Jay Kreps 2013-07-03, 23:49
I guess I am more concerned about the long term than the short term. I
think if you guys want to have all the Hadoop+Kafka stuff then we should
move the producer there and it sounds like it would be possible to get
similar functionality from the existing consumer code. I am not in a rush I
just want to figure out a plan.

The alternative is if there is anyone who is interested in maintaining this
stuff in Kafka. The current state where it is poorly documented and
maintained is not good.

-Jay
On Wed, Jul 3, 2013 at 1:51 PM, Ken Goodhope <[EMAIL PROTECTED]> wrote:

> We can easily make a Camus configuration that would mimic the
> functionality of the hadoop consumer in contrib.  It may require the
> addition of a BinaryWritable decoder, and a couple minor code changes.  As
> for the producer, we don't have anything in Camus that does what it does.
> But maybe we should at some point.  In the meantime, Gaurav is going to
> take a look at what is in contrib and see if it is easily fixed.  I have a
> feeling it probably will take minimal effort, and allow us to kick the can
> down the road till we get more time to properly address this.
>
> @Jay, would this work for now?
>
> Ken
>
>
> On Wed, Jul 3, 2013 at 10:57 AM, Felix GV <[EMAIL PROTECTED]> wrote:
>
>> IMHO, I think Camus should probably be decoupled from Avro before the
>> simpler contribs are deleted.
>>
>> We don't actually use the contribs, so I'm not saying this for our sake,
>> but it seems like the right thing to do to provide simple examples for this
>> type of stuff, no...?
>>
>> --
>> Felix
>>
>>
>> On Wed, Jul 3, 2013 at 4:56 AM, Cosmin Lehene <[EMAIL PROTECTED]> wrote:
>>
>>> If the Hadoop consumer/producers use-case will remain relevant for Kafka
>>> (I assume it will), it would make sense to have the core components
>>> (kafka
>>> input/output format at least) as part of Kafka so that it could be built,
>>> tested and versioned together to maintain compatibility.
>>> This would also make it easier to build custom MR jobs on top of Kafka,
>>> rather than having to decouple stuff from Camus.
>>> Also it would also be less confusing for users at least when starting
>>> using Kafka.
>>>
>>> Camus could use those instead of providing it's own.
>>>
>>> This being said we did some work on the consumer side (0.8 and the
>>> new(er)
>>> MR API).
>>> We could probably try to rewrite them to use Camus or fix Camus or
>>> whatever, but please consider this alternative as well.
>>>
>>> Thanks,
>>> Cosmin
>>>
>>>
>>>
>>> On 7/3/13 11:06 AM, "Sam Meder" <[EMAIL PROTECTED]> wrote:
>>>
>>> >I think it makes sense to kill the hadoop consumer/producer code in
>>> >Kafka, given, as you said, Camus and the simplicity of the Hadoop
>>> >producer.
>>> >
>>> >/Sam
>>> >
>>> >On Jul 2, 2013, at 5:01 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>> >
>>> >> We currently have a contrib package for consuming and producing
>>> messages
>>> >> from mapreduce (
>>> >>
>>> >>
>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5
>>> >>3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD
>>> >> ).
>>> >>
>>> >> We keep running into problems (e.g. KAFKA-946) that are basically due
>>> to
>>> >> the fact that the Kafka committers don't seem to mostly be Hadoop
>>> >> developers and aren't doing a good job of maintaining this code
>>> >>(keeping it
>>> >> tested, improving it, documenting it, writing tutorials, getting it
>>> >>moved
>>> >> over to the more modern apis, getting it working with newer Hadoop
>>> >> versions, etc).
>>> >>
>>> >> A couple of options:
>>> >> 1. We could try to get someone in the Kafka community (either a
>>> current
>>> >> committer or not) who would adopt this as their baby (it's not much
>>> >>code).
>>> >> 2. We could just let Camus take over this functionality. They already
>>> >>have
>>> >> a more sophisticated consumer and the producer is pretty minimal.
>>> >>
>>> >> So are there any people who would like to adopt the current Hadoop