Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Kafka/Hadoop consumers and producers


Copy link to this message
-
Re: Kafka/Hadoop consumers and producers
Andrew Otto 2013-08-13, 14:47
Andrew,

I'm about to dive into figuring out how to use Camus without Avro.  Perhaps we should join forces?  (Be warned thought! My java fu is low at the moment. :) ).

-Ao
On Aug 12, 2013, at 11:20 PM, Andrew Psaltis <[EMAIL PROTECTED]> wrote:

> Kam,
> I am perfectly fine if you pick this up. After thinking about it for a
> while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
> more closely matches our use case, with the caveat of we do not use Avro.
> With that said, I will try and work on the back-port of custom data writer
> patch[1], however, I am not sure how quickly I will get this done as we
> are going to work towards upgrading our Kafka cluster.
>
> Thanks,
> Andrew
>
> [1]
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
> 2f7aa8
>
>
>
>
>
> On 8/12/13 6:16 PM, "Kam Kasravi" <[EMAIL PROTECTED]> wrote:
>
>> I would like to do this refactoring since I did a high level consumer a
>> while ago.
>> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>> hoping to add to contribute.
>> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>> 0.8 to the bigtop distribution.
>> KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>> services and would ease some of the
>> startup/configuration issues that newbies have when getting started with
>> kafka. Ideally I would like to
>> fold a number of kafka/bin/* commands into the kafka service. Andrew
>> please let me know if would like to
>> pick this up instead. Thanks!
>>
>> Kam
>>
>>
>> ________________________________
>> From: Jay Kreps <[EMAIL PROTECTED]>
>> To: Ken Goodhope <[EMAIL PROTECTED]>
>> Cc: Andrew Psaltis <[EMAIL PROTECTED]>;
>> [EMAIL PROTECTED]; "[EMAIL PROTECTED]"
>> <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]"
>> <[EMAIL PROTECTED]>; Felix GV <[EMAIL PROTECTED]>; Cosmin Lehene
>> <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>;
>> "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Sent: Saturday, August 10, 2013 3:30 PM
>> Subject: Re: Kafka/Hadoop consumers and producers
>>
>>
>> So guys, just to throw my 2 cents in:
>>
>> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>> package wasn't getting as much attention as it should.
>>
>> 2. Andrew or anyone--if there is anyone using the contrib package who
>> would
>> be willing to volunteer to kind of adopt it that would be great. I am
>> happy
>> to help in whatever way I can. The practical issue is that most of the
>> committers are either using Camus or not using Hadoop at all so we just
>> haven't been doing a good job of documenting, bug fixing, and supporting
>> the contrib packages.
>>
>> 3. Ken, if you could document how to use Camus that would likely make it a
>> lot more useful to people. I think most people would want a full-fledged
>> ETL solution and would likely prefer Camus, but very few people are using
>> Avro.
>>
>> -Jay
>>
>>
>> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <[EMAIL PROTECTED]>
>> wrote:
>>
>>> I just checked and that patch is in .8 branch.   Thanks for working on
>>> back porting it Andrew.  We'd be happy to commit that work to master.
>>>
>>> As for the kafka contrib project vs Camus, they are similar but not
>>> quite
>>> identical.  Camus is intended to be a high throughput ETL for bulk
>>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>>> more of a simple KafkaInputFormat.  Neither can really replace the
>>> other.
>>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>>> data into that workflow, using Camus would be a gigantic overkill and a
>>> pain to setup.  On the flipside, if what you want is frequent reliable
>>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>>> with that.
>>>
>>> I think it would be preferable to simplify the existing contrib
>>> Input/OutputFormats by refactoring them to use the more stable higher