I need to upgrade some kafka broker servers. So I need to seamlessly migrate traffic from the old brokers to the new ones, without losing data, and without stopping producers. I can temporarily stop consumers, etc.
Is there a strategy for this?
Also, because of the way we are embedding kafka in our framework, our brokerId's are auto-generated (based on hostname, etc.), so I can't simply copy over broker log files, etc., by transferring an old brokerId to a new host.
Is there a way to change the view of the cluster from the producer's standpoint, without doing so from the consumers standpoint? That way, the producers can start writing to the new brokers, while the consumers drain all data from the old brokers before switching to the new brokers.
I don't actually care about ordering of messages, since the consumers are publishing them to a store that will index them properly based on source timestamp, etc.
We are using zk for both producers and consumers connections.
This is using 0.7.2. I assume in 0.8 it will be easier, since with replication, you can phase in the new servers gradually, etc., no?
1. Start a mirror Kafka cluster with the new version on a separate zookeeper namespace. Configure this to mirror data from the existing kafka cluster. 2. Move your consumers to pull data from the mirror 3. For each producer, one at a time, change the zookeeper namespace to point to the mirror and restart the producer. 4. Once the producers have moved to mirror cluster, shutdown mirroring and old cluster.
On Tuesday, March 19, 2013, Jason Rosenberg wrote:
On Wed, Mar 20, 2013 at 9:06 AM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
Not sure I understand. I need for a node to be taken out of the pool that producers produce to, but still need consumers consuming from all brokers, while we drain data from the brokers to be replaced.
Since I am using zk for producers to discover brokers, there's not an easy way to tell producers to stop producing to a sub-set of nodes, without also having the same affect on consumers.
Maybe I should first switch all producers to use a brokerlist, pointing to only the new hosts. But still have all the hosts in zk.....that might work, I should think. But then I'm making the commitment not to be using zk to connect to the brokers for producers.
On Wed, Mar 20, 2013 at 10:55 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote: Taking it out from behind the load-balancer between Producers and Kafka means that Producers can no longer write to it. I said nothing about disconnecting the *Consumers*. :-)
On Wed, Mar 20, 2013 at 11:10 AM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
No worries Philip, I'll assume you you mispoke at first when talking about a load-balancer between the consumers and brokers. Kafka, unfortunately, doesn't allow consumers to connect to kafka via a load balancer. For producers, also, you can't really use a load-balancer to connect to brokers (you can use zk, or you can use a broker list, in 0.7.2, and in 0.8, you can use an LB for the initial meta data connection, but then you still have to have direct connections to each broker from each producer).
Ah yes, I misspoke. I meant an LB between Producers and Kafka Brokers. Huh? Sure you can, if the Producers are simple. Producers just need a destination IP address and port. They have no way of knowing if that IP is an LB or real Kafka broker. Set the partition to -1 (i.e. random) in all messages destined for Kafka, and it all just works. Granted, this is a simple type of 0.72 deployment. Perhaps it doesn't work for more complex Producers or 0.8 deployments -- I am not yet that familiar with 0.8.