|
Russell Jurney
2012-05-23, 05:23
Jay Kreps
2012-05-23, 05:30
Russell Jurney
2012-05-23, 05:35
S Ahmed
2012-05-23, 15:21
Jay Kreps
2012-05-23, 17:49
S Ahmed
2012-05-23, 18:19
Russell Jurney
2012-05-23, 18:34
Jay Kreps
2012-05-23, 21:17
|
-
Kafka events to S3?Russell Jurney 2012-05-23, 05:23
-
Re: Kafka events to S3?Jay Kreps 2012-05-23, 05:30
No. Patches accepted.
-Jay On Tue, May 22, 2012 at 10:23 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > Is there a simple way to dump Kafka events to S3 yet? > > Russell Jurney http://datasyndrome.com >
-
Re: Kafka events to S3?Russell Jurney 2012-05-23, 05:35
Yeah, no kidding. I keep waiting on one :)
Russell Jurney http://datasyndrome.com On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > No. Patches accepted. > > -Jay > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney > <[EMAIL PROTECTED]>wrote: > >> Is there a simple way to dump Kafka events to S3 yet? >> >> Russell Jurney http://datasyndrome.com >>
-
Re: Kafka events to S3?S Ahmed 2012-05-23, 15:21
What would be needed to do this?
Just thinking off the top of my head: 1. create a zookeeper store to keep track of the last message offset persisted to s3, and which messages each consumer is processing. 2. pull messages off and group in whatever grouping you want (per message, 10 messages, etc.), and spin off a executorservice to push to s3, update the zookeeper offset. I'm new to kafka, but I would have to investigate on how multiple consumers can pull messages and push to s3, while not having the consumers pull the same messages. Setting up a zookeeper store to track progress specifically for what has been pushed to s3. On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <[EMAIL PROTECTED]>wrote: > Yeah, no kidding. I keep waiting on one :) > > Russell Jurney http://datasyndrome.com > > On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > No. Patches accepted. > > > > -Jay > > > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney > > <[EMAIL PROTECTED]>wrote: > > > >> Is there a simple way to dump Kafka events to S3 yet? > >> > >> Russell Jurney http://datasyndrome.com > >> >
-
Re: Kafka events to S3?Jay Kreps 2012-05-23, 17:49
Basically it would just be a consumer that wrote to S3. Kafka handles
scaling the consumption while making sure each consumer gets a subset of data. Probably we could make some command line tool. You would need some way to let the user control the format of the S3 data in a pluggable fashion. It could be a contrib package, or even just a separate github mini-project since it just works off the public api and would really just be used by people who want to get stuff into S3. -Jay On Wed, May 23, 2012 at 8:21 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > What would be needed to do this? > > Just thinking off the top of my head: > > 1. create a zookeeper store to keep track of the last message offset > persisted to s3, and which messages each consumer is processing. > > 2. pull messages off and group in whatever grouping you want (per message, > 10 messages, etc.), and spin off a executorservice to push to s3, update > the zookeeper offset. > > I'm new to kafka, but I would have to investigate on how multiple consumers > can pull messages and push to s3, while not having the consumers pull the > same messages. > Setting up a zookeeper store to track progress specifically for what has > been pushed to s3. > > > On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <[EMAIL PROTECTED] > >wrote: > > > Yeah, no kidding. I keep waiting on one :) > > > > Russell Jurney http://datasyndrome.com > > > > On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > > > No. Patches accepted. > > > > > > -Jay > > > > > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney > > > <[EMAIL PROTECTED]>wrote: > > > > > >> Is there a simple way to dump Kafka events to S3 yet? > > >> > > >> Russell Jurney http://datasyndrome.com > > >> > > >
-
Re: Kafka events to S3?S Ahmed 2012-05-23, 18:19
>Kafka handles
>scaling the consumption while making sure each consumer gets a subset of >data. Is there a writeup on the algorithm used to do that? Sounds interesting :) Agreed, this sounds like more of a contrib. On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > Basically it would just be a consumer that wrote to S3. Kafka handles > scaling the consumption while making sure each consumer gets a subset of > data. Probably we could make some command line tool. You would need some > way to let the user control the format of the S3 data in a pluggable > fashion. It could be a contrib package, or even just a separate github > mini-project since it just works off the public api and would really just > be used by people who want to get stuff into S3. > > -Jay > > On Wed, May 23, 2012 at 8:21 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > > > What would be needed to do this? > > > > Just thinking off the top of my head: > > > > 1. create a zookeeper store to keep track of the last message offset > > persisted to s3, and which messages each consumer is processing. > > > > 2. pull messages off and group in whatever grouping you want (per > message, > > 10 messages, etc.), and spin off a executorservice to push to s3, update > > the zookeeper offset. > > > > I'm new to kafka, but I would have to investigate on how multiple > consumers > > can pull messages and push to s3, while not having the consumers pull the > > same messages. > > Setting up a zookeeper store to track progress specifically for what has > > been pushed to s3. > > > > > > On Wed, May 23, 2012 at 1:35 AM, Russell Jurney < > [EMAIL PROTECTED] > > >wrote: > > > > > Yeah, no kidding. I keep waiting on one :) > > > > > > Russell Jurney http://datasyndrome.com > > > > > > On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > > > > > No. Patches accepted. > > > > > > > > -Jay > > > > > > > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney > > > > <[EMAIL PROTECTED]>wrote: > > > > > > > >> Is there a simple way to dump Kafka events to S3 yet? > > > >> > > > >> Russell Jurney http://datasyndrome.com > > > >> > > > > > >
-
Re: Kafka events to S3?Russell Jurney 2012-05-23, 18:34
I've always hoped that since Kafka is agnostic about message payload
format (right?), that written format might be too... but maybe that is a bit over simplified. Russell Jurney http://datasyndrome.com On May 23, 2012, at 11:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote: >> Kafka handles >> scaling the consumption while making sure each consumer gets a subset of >> data. > Is there a writeup on the algorithm used to do that? Sounds interesting :) > > Agreed, this sounds like more of a contrib. > > On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >> Basically it would just be a consumer that wrote to S3. Kafka handles >> scaling the consumption while making sure each consumer gets a subset of >> data. Probably we could make some command line tool. You would need some >> way to let the user control the format of the S3 data in a pluggable >> fashion. It could be a contrib package, or even just a separate github >> mini-project since it just works off the public api and would really just >> be used by people who want to get stuff into S3. >> >> -Jay >> >> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <[EMAIL PROTECTED]> wrote: >> >>> What would be needed to do this? >>> >>> Just thinking off the top of my head: >>> >>> 1. create a zookeeper store to keep track of the last message offset >>> persisted to s3, and which messages each consumer is processing. >>> >>> 2. pull messages off and group in whatever grouping you want (per >> message, >>> 10 messages, etc.), and spin off a executorservice to push to s3, update >>> the zookeeper offset. >>> >>> I'm new to kafka, but I would have to investigate on how multiple >> consumers >>> can pull messages and push to s3, while not having the consumers pull the >>> same messages. >>> Setting up a zookeeper store to track progress specifically for what has >>> been pushed to s3. >>> >>> >>> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney < >> [EMAIL PROTECTED] >>>> wrote: >>> >>>> Yeah, no kidding. I keep waiting on one :) >>>> >>>> Russell Jurney http://datasyndrome.com >>>> >>>> On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: >>>> >>>>> No. Patches accepted. >>>>> >>>>> -Jay >>>>> >>>>> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney >>>>> <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Is there a simple way to dump Kafka events to S3 yet? >>>>>> >>>>>> Russell Jurney http://datasyndrome.com >>>>>> >>>> >>> >>
-
Re: Kafka events to S3?Jay Kreps 2012-05-23, 21:17
That is true, but you want lots of messages in a particular S3 bucket, so
you need some kind of separator or delimeter. -Jay On Wed, May 23, 2012 at 11:34 AM, Russell Jurney <[EMAIL PROTECTED]>wrote: > I've always hoped that since Kafka is agnostic about message payload > format (right?), that written format might be too... but maybe that is > a bit over simplified. > > Russell Jurney http://datasyndrome.com > > On May 23, 2012, at 11:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > > >> Kafka handles > >> scaling the consumption while making sure each consumer gets a subset of > >> data. > > Is there a writeup on the algorithm used to do that? Sounds interesting > :) > > > > Agreed, this sounds like more of a contrib. > > > > On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > >> Basically it would just be a consumer that wrote to S3. Kafka handles > >> scaling the consumption while making sure each consumer gets a subset of > >> data. Probably we could make some command line tool. You would need some > >> way to let the user control the format of the S3 data in a pluggable > >> fashion. It could be a contrib package, or even just a separate github > >> mini-project since it just works off the public api and would really > just > >> be used by people who want to get stuff into S3. > >> > >> -Jay > >> > >> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > >> > >>> What would be needed to do this? > >>> > >>> Just thinking off the top of my head: > >>> > >>> 1. create a zookeeper store to keep track of the last message offset > >>> persisted to s3, and which messages each consumer is processing. > >>> > >>> 2. pull messages off and group in whatever grouping you want (per > >> message, > >>> 10 messages, etc.), and spin off a executorservice to push to s3, > update > >>> the zookeeper offset. > >>> > >>> I'm new to kafka, but I would have to investigate on how multiple > >> consumers > >>> can pull messages and push to s3, while not having the consumers pull > the > >>> same messages. > >>> Setting up a zookeeper store to track progress specifically for what > has > >>> been pushed to s3. > >>> > >>> > >>> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney < > >> [EMAIL PROTECTED] > >>>> wrote: > >>> > >>>> Yeah, no kidding. I keep waiting on one :) > >>>> > >>>> Russell Jurney http://datasyndrome.com > >>>> > >>>> On May 22, 2012, at 10:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> No. Patches accepted. > >>>>> > >>>>> -Jay > >>>>> > >>>>> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney > >>>>> <[EMAIL PROTECTED]>wrote: > >>>>> > >>>>>> Is there a simple way to dump Kafka events to S3 yet? > >>>>>> > >>>>>> Russell Jurney http://datasyndrome.com > >>>>>> > >>>> > >>> > >> > |