Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - default configs


Copy link to this message
-
default configs
Jay Kreps 2013-01-18, 05:09
Currently kafka broker config is all statically defined in a properties
file with the broker. This mostly works pretty well, but for per-topic
configuration (the flush policy, partition count, etc) it is pretty painful
to have to bounce the broker every time you make a config change.

That lead to this proposal:
https://cwiki.apache.org/confluence/display/KAFKA/Dynamic+Topic+Config

An open question is how topic-default configurations should work.

Currently each of our topic-level configs is paired with a default. So you
would have something like
  segment.size.bytes
which would be the default, and then you can override this for topics that
need something different using a map:
  segment.size.bytes.per.topic

The proposal is to move the topic configuration into zookeeper so that for
a topic "my-topic" we would have a znode
  /brokers/topics/my-topic/config
and the contents of this znode would be the topic configuration either as
json or properties or whatever.

There are two ways this config could work:
1. Defaults resolved at topic creation time: At the time a topic is created
the user would specify some properties they wanted for that topic, any
topic they didn't specify would take the server default. ALL these
properties would be stored in the znode.
2. Defaults resolved at config read time: When a topic is created the user
specifies particularly properties they want and ONLY the properties they
particularly specify would be stored. At runtime we would merge these
properties with whatever the server defaults currently are.

This is a somewhat nuanced point, but perhaps important.

The advantage of the first proposal is that it is simple. If you want to
know the configuration for a particular topic you go to zookeeper and look
at that topics config. Mixing the combination of server config and
zookeeper config dynamically makes it a little harder to figure out what
the current state of anything is.

The disadvantage of the first proposal (and the advantage of the second
proposal) is that making global changes is easier. For example if you want
to globally lower the retention for all topics, in proposal one you would
have to iterate over all topics and update the config (this could be done
automatically with tooling, but under the covers the tool would do this).
In the second case you would just update the default value.

Thoughts? If no one cares, I will just pick whatever seems best.

-Jay