Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka events to S3?


Copy link to this message
-
Re: Kafka events to S3?
I've always hoped that since Kafka is agnostic about message payload
format (right?), that written format might be too... but maybe that is
a bit over simplified.

Russell Jurney http://datasyndrome.com

On May 23, 2012, at 11:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote:

>> Kafka handles
>> scaling the consumption while making sure each consumer gets a subset of
>> data.
> Is there a writeup on the algorithm used to do that? Sounds interesting :)
>
> Agreed, this sounds like more of a contrib.
>
> On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
>> Basically it would just be a consumer that wrote to S3. Kafka handles
>> scaling the consumption while making sure each consumer gets a subset of
>> data. Probably we could make some command line tool. You would need some
>> way to let the user control the format of the S3 data in a pluggable
>> fashion. It could be a contrib package, or even just a separate github
>> mini-project since it just works off the public api and would really just
>> be used by people who want to get stuff into S3.
>>
>> -Jay
>>
>> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
>>
>>> What would be needed to do this?
>>>
>>> Just thinking off the top of my head:
>>>
>>> 1. create a zookeeper store to keep track of the last message offset
>>> persisted to s3, and which messages each consumer is processing.
>>>
>>> 2. pull messages off and group in whatever grouping you want (per
>> message,
>>> 10 messages, etc.), and spin off a executorservice to push to s3, update
>>> the zookeeper offset.
>>>
>>> I'm new to kafka, but I would have to investigate on how multiple
>> consumers
>>> can pull messages and push to s3, while not having the consumers pull the
>>> same messages.
>>> Setting up a zookeeper store to track progress specifically for what has
>>> been pushed to s3.
>>>
>>>
>>> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <
>> [EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> Yeah, no kidding. I keep waiting on one :)
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> No. Patches accepted.
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Is there a simple way to dump Kafka events to S3 yet?
>>>>>>
>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>
>>>>
>>>
>>