Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka in AWS?


Copy link to this message
-
Re: Kafka in AWS?
Neha,

My requirement is not related to Russell's, but I thought it will be
helpful describe what we need at GumGum <http://gumgum.com/>.
I wasn't sure whether it's Kafka domain since kafka gives you a topic
to pull  data from and then it's up to you to do whatever with it.

But since we are talking about it, here is what we do everyday (currently
without Kafka):

We are a ad network. We write all of our impressions and clicks data in
various log files and upload it to S3. At night we run many Map reduce jobs
to aggregate this data in various ways.
We have an 'Autoscaled' cluster in AWS. Our webservers keep going up and
down based on the load on the system.

Whenever a server shuts down we tend to lose data. Many times file upload
is not completed in time before the server shuts down. That is why we are
looking at implementing Kafka to send events in real time to S3 without
losing them.

If there exists a 'sink' that transfers data to S3, our job will be lot
easier. But again, I am not sure whether Kafka is supposed to provide that
or not.

Regards,
Vaibhav
On Tue, Mar 20, 2012 at 10:03 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Russell,
>
> By "sink events into S3", do you mean you want to have some plugin that
> will suck data out of your Kafka brokers and upload to S3. Would you mind
> describing use cases that would require to send data to Kafka, then upload
> data to S3, and then use it by querying S3 ?
>
> Thanks,
> Neha
> On Mar 20, 2012 4:51 PM, "Russell Jurney" <[EMAIL PROTECTED]>
> wrote:
>
> > I think as soon as someone commits code that reliably sinks events to S3,
> > Kafka adoption will skyrocket.  There is no good solution to this yet.
> >  MANY people want one.
> >
> > Russ
> >
> > On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <[EMAIL PROTECTED]> wrote:
> >
> > > The primary use case for Kafka is to use it on AWS...???
> > >
> > > Sorry if I put words you didn't intend in your mouth :P ... I just
> > thought
> > > that sounded funny ;)
> > >
> > > Sorry for being off-topic. Carry on :/ !
> > >
> > > --
> > > Felix
> > >
> > >
> > >
> > > On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Yeah, that is the part I am hoping someone will contribute :)  I
> know I
> > > can
> > > > write that myself.  I also know it will be buggy and that I will have
> > > lots
> > > > of trouble.
> > > >
> > > > If you contribute this code, it would be a huge boon to Kafka.  It is
> > imo
> > > > the primary use case for Kafka atm... if only the code gets into git.
> > > >
> > > > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Russell,
> > > > >
> > > > > I'm actually in the process of writing a Java code to go from Kafka
> > > > > messages to S3.  I might be able to rip-out my application-specific
> > > > > parts and share something later tonight.
> > > > >
> > > > > The biggest hassle is that you can't append to existing S3 files.
>  So
> > > > > unless you're planning on uploading each message as a separate S3
> > > > > object, this means you need message aggregation smarts on the Kafka
> > > > > consumer / S3 uploader side of things.
> > > > >
> > > > > Best,
> > > > > Niek
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
> > > > > <[EMAIL PROTECTED]> wrote:
> > > > > > I wish someone would publish some source that writes events to
> S3.
> > > > > >
> > > > > > Russell Jurney
> > > > > > twitter.com/rjurney
> > > > > > [EMAIL PROTECTED]
> > > > > > datasyndrome.com
> > > > > >
> > > > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <[EMAIL PROTECTED]>
> > wrote:
> > > > > >
> > > > > >> We've been successfully using Kafka on AWS as well, and JMX wise
> > we
> > > > > >> just use an SSH tunnel.
> > > > > >>
> > > > > >> In general, we've been very happy with the performance on AWS,
> > which
> > > > > >> some people have reservations about due to the I/O situation on