Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Comparison of kafka and Hedwig?

Copy link to this message
Re: Comparison of kafka and Hedwig?
i'm not sure cross posting is such a good idea, it is probably better to
discuss these comparisons relative to a project's viewpoint.

the problem space is similar, and i imagine as kafka develops the
implementations may become even closer.

there are a couple of obvious things:

1) hedwig has strong durability guarantees. kafka can lose data due to
2) hedwig was designed for lots of topics (100,000s) with low fan out
(few subscribers/publishers). i think kafka is designed for a few topics
with lots of subscribers and publishers.
3) hedwig tracks subscribers progress for gc of publishes. kafka uses
time based gc.
4) hedwig will replay messages to subscribers starting from the last
message they explicitly consumed. kafka allows subscribers to replay
messages that they have already consumed.

there are probably others.

i really like the kafka design choices made for 3 and 4. hedwig will
work on scaling to more subscribers/publishers per topic. i imagine, if
needed, kafka will work on their durability guarantees and support for
large number of topics.


On 02/10/2011 01:27 AM, Thomas Koch wrote:
> Flavio Junqueira:
>> Thomas, Did you mean to say Hedwig instead of BookKeeper?
> Oh sh..ugar yeah. Thanks. Start over again:
> I've just had a look at the kafka slides[1] from January HUG. It seems to me,
> that Hedwig[2] and kafka are quite similar in there problem space. Is that
> so? What are notable differences?
> (Kafka is written in scala and therefor must be a lot cooler :-)
> [1]<http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hadoop-user-group-
> january-2011-recap/>
> [2] http://cwiki.apache.org/confluence/display/ZOOKEEPER/HedWig
> Thomas Koch, http://www.koch.ro