Hi Jay,

On Tue, Nov 21, 2017, at 19:03, Jay Kreps wrote:

I think there are two big reasons to consider separating
IncrementalFetchRequest from FetchRequest.

As you say, the first reason is compatibility.  We will have to support
the full FetchRequest for a long time to come because of our
compatibility policy.  It would be good from a code quality point of
view to avoid having widely diverging code paths for different versions
of this request.

The other reason is that conceptually I feel that there should be both
full and incremental fetch requests.  This is similar to how HDFS has
both incremental and full block reports.  The full reports are necessary
when a node is restarted.  In HDFS, they also serve a periodic sanity
check if the DataNode's view of what blocks exist has become
desynchronized from the NameNode's view.  While in theory you could
avoid the sanity check, in practice it often was important.

Also, just to be clear, I don't think we should convert KafkaConsumer to
using incremental fetch requests.  It seems inadvisable to allocate
broker memory for each KafkaConsumer.  After all, there can be quite a
few consumers, and we don't know ahead of time how many there will be.
This is very different than brokers, where there are a small,
more-or-less constant, number.  Also, consumers tend not to consume from
a massive number of topics all at once, so I don't think they have the
same problems with the existing FetchRequest RPC as followers do.

Sorry, I may have done a poor job explaining the proposal.  The
intention is that you cannot change the set of partitions you are
receiving information about except by making a full FetchRequest.  If
you need to make any changes to the watch set whatsoever, you must make
a full request, not an incremental.  The idea is that changes are very
infrequent, so we don't need to optimize this at the moment.

I think your proposal is actually closer to what I was intending than
you thought.  Like I said above, I believe watch-set-change operations
should require a full fetch request.  It is certainly simpler to
implement and understand.

If I understand your proposal correctly, you are suggesting that the
existing FetchRequest RPC should be able to do double duty as either a
full or an incremental request?

best,
Colin

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB