Do you read from the file in the callback from kafka? I just implemented c++ bindings and in one of the tests i did I got the following results:
1000 messages per batch (fairly small messages ~150 bytes) and then wait for the network layer to ack the send (not server ack)'s before putting another message on the tcp socket. This seems to give me a average latency of 17 ms. Througput about 10MB/s .
If you are serializing your requests and is reading data from disk between calls to kafka then that would easily explain some added milliseconds in each call and thus a reduced throughput. Partitioning will not reduce latency.