Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Samza -- A YARN stream processing framework for Kafka


+
Jay Kreps 2013-08-23, 15:40
+
Jonathan Hodges 2013-08-27, 13:51
Copy link to this message
-
Re: Samza -- A YARN stream processing framework for Kafka
I can't answer the rest but the catchy name is from Gregor Samza. A
character from Kafka's novel called The Metamorphosis.

https://en.wikipedia.org/wiki/Gregor_Samsa#Gregor_Samsa
-Xavier
On Tue, Aug 27, 2013 at 6:51 AM, Jonathan Hodges <[EMAIL PROTECTED]> wrote:

> First off, I want to say this is awesome!  It has been great to see all the
> great YARN offerings being released lately.  I noticed Hadoop 2.x was
> recently voted beta so very exciting!
>
> Like many we use Storm for near real-time processing our Kafka based
> streams.  In addition we send this data to Hadoop for offline analysis.
>  Consolidating these three environments to one is a win by itself.  I also
> really like the fault tolerance and security features.  Are you guys using
> Samza in production yet at LinkedIn or still development?
>
> The local state approach is very interesting.  Are you guys using Databus
> for the feed of changes from the external stores?  Is something like
> Voldemort integrated locally for the key/value store?  Can you maintain
> multiple tables locally for stream processing?
>
> Since we are using Storm, do any latency comparisons exist?  Since Samza
> makes the fault tolerance/durability tradeoff to persist to disk on every
> hop between StreamTasks, it would seem to take a hit here.  That said we
> use Trident a good bit, so many of our topologies are already slowed by
> remote calls to Cassandra.
>
> I know it is fairly new, but were any comparisons against Spark Streaming
> considered?  They take a similar tact of maintaining state locally as
> opposed to external stores, but I believe they are limited on what can fit
> in memory.
>
> Finally where did the catchy name, Samza come from?
>
> Thanks!
> Jonathan
>
>
>
> On Fri, Aug 23, 2013 at 9:39 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Hey guys,
> >
> > This may be relevant to people on this list. A few of us at LinkedIn have
> > been working on Samza, a stream processing framework built on YARN. We
> just
> > added this as an Apache Incubator project. We would love to get people's
> > feedback (and help!). Here are the docs:
> >
> > http://samza.incubator.apache.org
> >
> > If anyone has any questions I'm happy to discuss what we are up to. Our
> > mailing list is here:
> >
> > http://samza.incubator.apache.org/community/mailing-lists.html
> >
> > -Jay
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB