Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # dev - Kafka stream processing framework?


+
David Arthur 2012-11-08, 15:32
+
Prashanth Menon 2012-11-08, 15:45
+
David Arthur 2012-11-08, 16:38
+
Prashanth Menon 2012-11-08, 20:37
Copy link to this message
-
Re: Kafka stream processing framework?
Milind Parikh 2012-11-08, 22:56
I have an very early version of streaming in Kafka.... I will not  be able
to respond back for the next nine days because of no internet connectvity..
But the early version (0.0.1) streams events to browser as events become
available.

www.github.com/milindparikh/streamkl

Regards
Milind

On Fri, Nov 9, 2012 at 2:07 AM, Prashanth Menon
<[EMAIL PROTECTED]>wrote:

> Hi David,
>
> The pattern you mention is a very common one and while Kafka may be a good
> fit, it's impossible to know without more information.  Mind you, I'm a
> committer ...
>
> - Do you need to re-read or replay messages?
> - Are all your consumers always online?
> - At what rate are messages coming in?
> - Do you need to process all your messages in-order?
>
> What most will suggest is to go with a RabbitMQ or ActiveMQ with a queue +
> workers where IDs are partitioned across the set of workers.  This is
> simple and I suspect shoudl satisfy your requirements.  If the
> "distribution" aspect is especially what you need, you'll have to wait for
> the 0.8 Kafka release.  IIRC, RabbitMQ has clustering capabilities (you'll
> have to fuss around setting up an NFS so durable messages are persisted in
> the cluster) and ActiveMQ can operate in P2P and brokered mode.
>
> Storm, as you mentioned, is more of a stream *processing* system that
> allows you to filter, process, pipe and connect a "firehose".
> Interestingly enough, you can use Kafka as a "firehouse" that feeds into
> Storm, but this isn't what you're looking for (but it's quite interesting
> nonetheless).
>
> Hope that helps - other's are welcome to chime in, too :)
>
> - Prashanth
>
> On Thu, Nov 8, 2012 at 11:38 AM, David Arthur <[EMAIL PROTECTED]> wrote:
>
> > Prashanth,
> >
> > Storm seems to be more focused on data flow between Storm processors
> > (spout? bolt? i forget). My particular use case follows this pattern:
> >
> > * read id from kafka queue
> > * fetch object from database
> > * modify the object
> > * write back to database
> >
> > Would Storm be a good fit for this? It doesn't seem to fit in with the
> > whole bolt/spout pattern. It's more like a distributed task queue.
> >
> > Thoughts?
> >
> > On Nov 8, 2012, at 10:45 AM, Prashanth Menon wrote:
> >
> > > Yup, I believe Storm as a KafkaSpout that you can use.  Is there
> > something
> > > specific you were interested in?
> > >
> > > On Thu, Nov 8, 2012 at 10:32 AM, David Arthur <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> There is a line item on the project ideas for "improved stream
> > processing
> > >> libraries". I was wondering if anyone has done any work on this. I
> know
> > you
> > >> can hook Kafka into things like Storm and S4(?), but I'm not looking
> > for a
> > >> CEP/dataflow thing, just distributed stream processing
> > >>
> > >> -David
> >
> >
>