It's one of the latest versions of ZK that comes with Cloudera CDH4, I
> Which version of ZK are you using?
> On Fri, Jun 7, 2013 at 12:20 PM, Evan Chan <[EMAIL PROTECTED]> wrote:
> > [ Sorry if this mail is duplicated, this is my fourth try sending this
> > message]
> > Hey guys,
> > I sincerely apologize if this has been covered before, I haven't quite
> > found a similar situation.
> > We are using Kafka 0.7.2 in production, and we are using the ZK high
> > Scala consumer. However, we find the ZK consumer very unstable. It
> > work for one or two weeks, then suddenly it would complain about ZK nodes
> > disappearing, and one consumer would die, then another, then another,
> > our pipeline is no longer pulling any data. There are multiple
> > NullPointerExceptions, and other problems. We can restart it, but it
> > does not stay up predictably.
> > On the other hand, I have a simple app which I wrote using the simple
> > consumer to mirror select partitions (will blog about this later) and it
> > just works flawlessly.
> > So we are faced with a dilemma to get back on track:
> > 1) Use SimpleConsumer, and write our own balancing code (but honestly
> > boxes almost never go down, compared to the rate of ZK mishaps)
> > 2) Upgrade to Kafka 0.8 and hope that that resolves the issue.
> > There seem to be so many improvements in 0.8 that that seems to be the
> > biggest win long-term, so I am wondering if people can comment on:
> > - has anyone tried using 0.8 in production? Is it stable yet?
> > - How much more stable is the ZK consumer in 0.8?
> > - will it be possible to change the offset in the 0.8 consumer? That was
> > the other reason why we wanted to move to SimpleConsumer.
> > thanks,
> > Evan
> On Sat, Jun 8, 2013 at 2:09 AM, Jonathan Hodges <[EMAIL PROTECTED]> wrote:
> > Thanks so much for your replies. This has been a great help
> > Rabbit better with having very little experience with it. I have a few
> > follow up comments below.
> Happy to help!
> I'm afraid I don't follow your arguments below. Rabbit contains many
> optimisations too. I'm told that it is possible to saturate the disk
> i/o, and you saw the message rates I quoted in the previous email.
> YES of course there are differences, mostly an accumulation of things.
> For example Rabbit spends more time doing work before it writes to
> You said:
> "Since Rabbit must maintain the state of the
> consumers I imagine it’s subjected to random data access patterns on disk
> as opposed to sequential."
> I don't follow the logic here, sorry.
> Couple of side comments:
> * In your Hadoop vs RT example, Rabbit would deliver the RT messages
> immediately and write the rest to disk. It can do this at high rates
> - I shall try to get you some useful data here.
> * Bear in mind that write speed should be orthogonal to read speed.
> Ask yourself - how would Kafka provide a read cache, and when might
> that be useful?
> * I'll find out what data structure Rabbit uses for long term persistence.
> "Quoting the Kafka design page (
> http://kafka.apache.org/07/design.html) performance of sequential writes
> a 6 7200rpm SATA RAID-5 array is about 300MB/sec but the performance of
> random writes is only about 50k/sec—a difference of nearly 10000X."
> Depending on your use case, I'd expect 2x-10x overall throughput
> differences, and will try to find out more info. As I said, Rabbit
> can saturate disk i/o.
> >> While you are correct the payload is a much bigger concern, managing the
> >> metadata and acks centrally on the broker across multiple clients at
> >> is also a concern. This would seem to be exasperated if you have
> > consumers
> >> at different speeds i.e. Storm and Hadoop consuming the same topic.
[EMAIL PROTECTED] |