Jun Rao 2013-02-12, 14:44
David Arthur 2013-02-13, 15:09
This is a good idea. There are actually two ways to implement this:
1. A RESTFUL interface, as Jun mentions. This might make more sense
since if you don't mind the overhead of sending all the data twice
then you probably won't mind the overhead of HTTP.
2. Re-route misdirected requests in the brokers.
To effectively implement re-routing of requests requires a
non-blocking request router. Otherwise you end up blocking a thread
just waiting on the request. This might be okay (maybe just double the
number of threads) but isn't ideal. Our producer doesn't currently do
this kind of request pipelining, so a naive implementation that just
sent a produce request if the request wasn't local wouldn't quite do
it. The later strategy could be implemented much better when we have a
On Wed, Feb 13, 2013 at 7:08 AM, David Arthur <[EMAIL PROTECTED]> wrote:
> Thanks, Jun, this answers my questions.
> I wasn't necessarily thinking of an HTTP interface like Solr, but rather the
> way it routes requests to leaders. However, since brokers are not aware of
> all the partition leaders, then the Solr approach will not work.
> I actually worked a bit on a REST interface a while ago:
> https://github.com/mumrah/kafka/tree/rest/contrib/rest-proxy, once 0.8 is
> out I might pick it up and clean it up a bit.
> On 2/12/13 9:44 AM, Jun Rao wrote:
>> The benefit of the strategy used in Solr is that it simplifies client
>> routing. The downside is potential additional RPC overhead and a bit more
>> logic in the server. Technically, you can achieve what Solr does in the
>> client layer too. You can run a proxy that runs the java version of Kafka
>> producer and exposes a restful api. Then, your non-java client can talk to
>> the proxy.
>> We do plan to support a restful api for the producer in the future. Doing
>> the Solr strategy needs more thinking since currently, not every broker
>> knows the leader of all partitions.
>> ---------- Forwarded message ----------
>> From: David Arthur <[EMAIL PROTECTED]>
>> Date: Mon, Feb 11, 2013 at 7:45 AM
>> Subject: Clients and replica leaders
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> In writing a client for 0.8, I now have to keep state of which
>> topic+partition is owned by what broker. This is inherently a pain to deal
>> with and has the downside that I must wait for an error before I am
>> notified about a change in the broker topology.
>> I would be nice if the clients didn't need to know so much about the
>> brokers. In Apache Solr, which actually has a similar
>> strategy, each server (broker) can handle requests for any shard
>> (partition) in the cluster. If the current server happens to be the leader
>> then it will process the request; if not it will forward it to the correct
>> server, wait for a response, then forward the response back to the client.
>> Dumb clients will pay the extra cost of the additional hop, but do not
>> to know anything about the brokers. Smart clients will work basically like
>> they would now with the added benefit of not getting an error when leader
>> Would a strategy like this work in 0.8? Do the brokers know about one