While working on the new producer, I noticed another issue about error handling that I would like to ask for some discussions (I know Jay said the protocol definition is the last one, but I guess that can never be true :)
Currently we have two classes of exceptions that we can throw when sending messages with the producer, class A is more operational related and recoverable, such as:
1. Producer buffer full with BLOCK_ON_BUFFER_FULL set to false. 2. Message size too large. 3. Timed out on produce requests. 4. Error responses from brokers. etc
Class B is more related to client app bugs and fatal, such as:
1. Invalid specified partition Id. 2. Calling send() after close(). 3. Send a message with topic as null. etc
We have two options to throw them: either we wrap all exceptions in the returned future which will then only throw the exception if you call future.get(), and the send() call will not expect any checked exceptions; or we wrap class A exceptions in future and directly throw class B exceptions in send() call. The pros of the first option is that you only need to try/catch the future.get call if you care enough to check these errors, the cons is that if your async producer does not care about class A but only want to check class B errors, you still need to block wait on get() for all messages. And vice versa for the other option.
Originally (i.e. in the first patch) I threw any exception I could including all class B exceptions directly from the send() call and only threw api exceptions with the future. My concern with this approach was just the complexity of the error handling. You end up needing to try/catch around the send call to handle anything that is thrown there but then you also MUST try/catch around the Future (since it is a checked exception).
My concern with this is just that people would do it wrong and only do one of these. So I changed it to throw all exceptions through the Future. However this has another problem. There could be a set of users who just want fast asynchronous send and who won't check the result of the send at all. Their rationale could be that, yes, there could be occasional network errors but as long as 99.9% of their data gets through they are okay. The problem is here that for these use cases we essentially mask a set of coding errors by funneling them through the future (if you don't check you may not realize that you are producing invalid partitions, messages that are too large, or whatever). This may be okay if we just do a good job of documenting how errors work.
Alternately we could throw user errors from send. These you wouldn't need to catch because they should not happen (and if they do it is a bug, not something you should really catch and handle). The only concern here is that we have to be very careful to ensure we categorize each exception correctly. For example I am not sure if sending a message larger than the max message size the producer set in their own config is a class A or class B exception. Ostensibly it is an error on their part, but often I have found that people haven't really thought about the serialized size of their messages at all.
-Jay On Wed, Feb 12, 2014 at 5:43 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:
Alternately we could throw user errors from send. These you wouldn't need to catch because they should not happen (and if they do it is a bug, not something you should really catch and handle). The only concern here is that we have to be very careful to ensure we categorize each exception correctly.
Thought about this. If I was a user of this API, I would prefer this approach. So you still don't have to wrap the send with try catch since those exceptions are not meant to be caught.
Thanks, Neha On Thu, Feb 13, 2014 at 8:34 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: