Hi, We know, in kafka 0.8, producer connect to broker directly, it without connecting to zookeeper. Than how it achieve zookeeper-based load balance on per-request basis? Actually, when a topic be created, its partition will distributed in one or more brokers. When a message be sent, it will be delivered to a certain partition according to its key word. That is to say ,a certain must be sent to a fixed partition on a fixed broker. How the so called load balancing works?
Basically, we spread partitions among multiple brokers. If a message is sent without a key, the producer picks a random partition to balance the load. If a message has a key, the default partitioner hashes the key to one of the partitions deterministically. Then, the load may not always be balanced.
On Mon, Jan 14, 2013 at 9:35 PM, Jun Guo -X (jungu - CIIC at Cisco) < [EMAIL PROTECTED]> wrote:
You are asking two related but different questions. One is how does load balancing work and the other is how does broker discovery works. Jun explained how load balancing works and how requests are routed to partitons. In 0.7, there were 2 options for broker discovery - zookeeper and hardware load balancer (VIP). The zookeeper based producer got notified by zookeeper whenever a new broker came up or an existing broker went down. In 0.8, we got rid of zookeeper from the producer and replaced it with a metadata API. On startup and whenever a request fails, the producer refreshes its view of the cluster by sending a metadata request to any of the brokers. So if a broker goes down or a new broker comes up, the leader for some partitions might change and the producer will know since its requests to the older leaders will fail.
Hope that helps, Neha On Mon, Jan 14, 2013 at 9:52 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
However, there is no need for a producer to call the metadata api directly. Our producer client does this for you automatically.
For your first question, if a new broker is added, currently we don't move data to it automatically. One has to run an admin command ReassignPartitionsCommand to balance the data.
For your second question, yes, you need at least one broker in broker.list to be alive. Another option is to use a vip in broker.list. That way, you can change the list of brokers associated with the vip without reconfiguring the clients.
The primary reason for removing ZK from the producer is to make it easy to write non-java clients.
On Mon, Jan 14, 2013 at 10:28 PM, Jun Guo -X (jungu - CIIC at Cisco) < [EMAIL PROTECTED]> wrote:
Correct me if I'm wrong, but I believe in 0.8, the producer uses a metadata request to get topic/partition mappings from the broker. The broker then interacts with ZK (rather than having the producer do it using zk.connect).
In the event that the master for a topic/partition fails, a new master broker will be elected for a given topic/partition pair. When the producer tries to send to the old broker (which is either dead, or a slave now), the broker will either not respond, or the response will contain an error code. In either case, I think the producer will do a new metadata request (to a broker) to get the latest topic/partition to broker mappings. In this way, it avoids having to use ZooKeeper: it offloads all ZK work to the broker.
On 1/14/13 9:35 PM, "Jun Guo -X (jungu - CIIC at Cisco)" <[EMAIL PROTECTED]> wrote:
In this case, the broker sends a response with an error code to the producer, and then the producer retries the metadata request. Other than that, your understanding of the producer is correct.
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext