Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka events to S3?


Copy link to this message
-
Re: Kafka events to S3?
I've always hoped that since Kafka is agnostic about message payload
format (right?), that written format might be too... but maybe that is
a bit over simplified.

Russell Jurney http://datasyndrome.com

On May 23, 2012, at 11:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote:

>> Kafka handles
>> scaling the consumption while making sure each consumer gets a subset of
>> data.
> Is there a writeup on the algorithm used to do that? Sounds interesting :)
>
> Agreed, this sounds like more of a contrib.
>
> On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
>> Basically it would just be a consumer that wrote to S3. Kafka handles
>> scaling the consumption while making sure each consumer gets a subset of
>> data. Probably we could make some command line tool. You would need some
>> way to let the user control the format of the S3 data in a pluggable
>> fashion. It could be a contrib package, or even just a separate github
>> mini-project since it just works off the public api and would really just
>> be used by people who want to get stuff into S3.
>>
>> -Jay
>>
>> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>>
>>> What would be needed to do this?
>>>
>>> Just thinking off the top of my head:
>>>
>>> 1. create a zookeeper store to keep track of the last message offset
>>> persisted to s3, and which messages each consumer is processing.
>>>
>>> 2. pull messages off and group in whatever grouping you want (per
>> message,
>>> 10 messages, etc.), and spin off a executorservice to push to s3, update
>>> the zookeeper offset.
>>>
>>> I'm new to kafka, but I would have to investigate on how multiple
>> consumers
>>> can pull messages and push to s3, while not having the consumers pull the
>>> same messages.
>>> Setting up a zookeeper store to track progress specifically for what has
>>> been pushed to s3.
>>>
>>>
>>> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <
>> [EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> Yeah, no kidding. I keep waiting on one :)
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> No. Patches accepted.
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Is there a simple way to dump Kafka events to S3 yet?
>>>>>>
>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>
>>>>
>>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB