Jun Rao 2013-01-29, 16:33
-Re: 0.8 wire protocol for inter-broker communication
Jay Kreps 2013-01-29, 18:42
I don't think this is actually that hard to handle, you just need a config
to enable the new fields:
Step 1: Implement optional support for the new field with some option that
controls whether it is used
Step 2: Push all servers, still using the old format.
Step 3: Now enable the new field on servers one at a time.
This is a couple of steps but since server pushes are easy that should be
If we want to make this easy for upgrades we can have an
"enable.0.8.compat.mode=true" flag which enables or disables all these
together when we do an official release and document it in the release
On Tue, Jan 29, 2013 at 8:33 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> In 0.8, we added versionId for each type of requests. The plan is that if
> we want to evolve a particular request, we can implement the logic in the
> broker to support both the old and the new versions. Then, we can upgrade
> the server first, followed by the clients.
> However, this approach doesn't quite work for requests used among brokers.
> These include all requests sent by the controller (e.g.,
> LeaderAndIsrRequest) and FetchRequest (used by replica fetchers). If we
> want to evolve those requests, we will have to bring down the whole cluster
> to do the upgrade (since each broker is both a client and a server). This
> of course will make the cluster unavailable.
> So, we need to think about a couple of things. First, what's our strategy
> to evolve those inter-broker requests. One thing that I can think of is to
> do the upgrade in two passes. In the first pass, we upgrade all brokers
> first so that each of them is capable of receiving the new version, but not
> able to send the new version (this can be controlled by a config). In the
> second pass, we upgrade all brokers again by allowing them to send the new
> version. Not sure if this is the best way since this will make upgrade a
> bit more complicated.
> Second, we probably need to make another pass of those requests to make
> sure that they are in good shape, since any change in the future may not be
> easy. For example, in LeaderAndIsr response, should we remove the global
> errorcode since we already have an errorcode per partition? Also, for the
> FetchRequest used by replica fetcher, currently we assume that the fetch
> offset equals to the logEndOffset of the remote replica. If we want to
> pipeline those requests, this may not be true. So, we will need a separate
> field to represent logEndOffset.