We've made some progress in our testing.  While I do not have a good
explanation for all the better behavior today, we have been able to move a
substantial number of messages through the system today without any
exceptions (> 800K messages).

The big things between last night's mess and today was: 1. I moved the
Kafka log dir (the segment files) to a separate drive from the system
drive), and 2. I rudeced the number of network and io threads back down to
2 each.

We also found a (probably) unrelated bug where we were getting the broker 0
and broker 1 host name mappings swapped (something about Zookeeper
returning children in any old order), so we weren't asking for topic
offsets from the correct broker.  The code worked fine when there was only
one broker, but in a multi-broker cluster, we got bogus results.

