Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> HA / failover


Copy link to this message
-
Re: HA / failover
Sorry Jun that it took me so long to reply.
There's still one thing I don't get:

>> There is one offset per topic/partition, if a partition is not available
because a broker is down, its offset in the consumer won't grow anymore.

So, because I want HA, I set up 2 brokers to attend same topic/partition, right?

using zk-producer  msgs will be sent only to one of those 2 brokers? Or will it balance randomly?

If one of those 2 brokers is down, producer will start sending messages to the one alive?
 
Example:
Start zk x 3, kafka x 2 (first run), 1 zk-producer, 1 zk-consumer

Produce msgs 1 & 2
Consume msg 1
kafka A fails -> consumer now reads kafka B
Produce msgs 3 & 4
Consume msgs 3,4
Kafka A is started
Consumer sees it, but won't ask for msg 2

Makes sense?

PS: I'm trying to understand how linkedin manages HA with sensei + kafka...sorry!
----- Mensaje original -----
De: Jun Rao [mailto:[EMAIL PROTECTED]]
Enviado: Tuesday, August 30, 2011 03:46 PM
Para: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Asunto: Re: HA / failover

See my inlined reply below.

Thanks,

Jun
On Tue, Aug 30, 2011 at 8:36 AM, Roman Garcia <[EMAIL PROTECTED]> wrote:

> >> Roman,
> Without replication, Kafka can lose messages permanently if the
> underlying storage system is damaged. Setting that aside, there are 2
> ways that you can achieve HA now. In either case, you need to set up a
> Kafka cluster with at least 2 brokers.
>
> Thanks for the clarification Jun. But even then, with replication, you
> could still lose messages, right?
>
>
If you do synchronous replication with replication factor >1 and there is
only 1 failure, you won't lose any messages.
> >> [...] Unconsumed messages on that broker will not be available for
> consumption until the broker comes up again.
>
> How does a Consumer fetch those "old" messages, given that it did
> already fetch "new" messages at a higher offset? What am I missing?
>

There is one offset per topic/partition, if a partition is not available
because a broker is down, its offset in the consumer won't grow anymore.
>
> >> The second approach is to use the built-in ZK-based software load
> balancer in Kafka (by setting zk.connect in the producer config). In
> this case, we rely on ZK to detect broker failures.
>
> This is the approach I've tried. I did use zj.connect.
> I started all locally:
> - 2 Kafka brokers (broker id=0 & 1, single partition)
> - 3 zookeeper nodes (all of these on a single box) with different
> election ports and different fs paths/ids.
> - 5 producer threads sending <1k msgs
>
> Then I killed one of the Kafka brokers, and all my producer threads
> died.
>
>
That could be a bug. Are you using trunk? Any errors/exceptions in the log?
> What I'm I doing wrong?
>
>
> Thanks!
> Roman
>
>
> -----Original Message-----
> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, August 30, 2011 11:44 AM
> To: [EMAIL PROTECTED]
> Subject: Re: HA / failover
>
> Roman,
>
> Without replication, Kafka can lose messages permanently if the
> underlying storage system is damaged. Setting that aside, there are 2
> ways that you can achieve HA now. In either case, you need to set up a
> Kafka cluster with at least 2 brokers.
>
> The first approach is to put the hosts of all Kafka brokers in a VIP and
> rely on a hardware load balancer to do health check and routing. In the
> case, all producers send data through the VIP. If one of the brokers is
> down temporarily, the load balancer will direct the produce requests to
> the rest of the brokers. Unconsumed messages on that broker will not be
> available for consumption until the broker comes up again.
>
>  The second approach is to use the built-in ZK-based software load
> balancer in Kafka (by setting zk.connect in the producer config). In
> this case, we rely on ZK to detect broker failures.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <[EMAIL PROTECTED]>
> wrote:
>
> > Hi, I'm trying to figure out how my prod environment should look like,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB