Cool. With respect to compression performance, we definitely see the same
thing, no debate.

Of course if you want to just compress the message payloads you can do that
now without needing much help from kafka--just pass in the compressed data.
Whether it not it will do much depends on the size of the message body--for
small messages you basically need batch compression, but for large messages
just compressing the body is fine. Our extra effort was to get the better
compression ratio of compressed messages.

What I was saying about snappy performance is that I think it may be our
our inefficiency in the compression code-path rather than the underlying
slowness of snappy. For example on this page
The compression performance they list for jni (the library we use) tends to
be around 200MB per core-second, with decompression around 1GB per
core-second. So on a modern machine with umpteen cores that should not be a
bottleneck, right? I don't know this to be true but I am wondering if the
the underlying bottleneck is the compression algorithm or our inefficient
code. If you look at kafka.message.ByteBufferMessageSet.{create,
decompress, and assignOffsets} it is pretty inefficient. I did a round of
improvement there but we are still recopying stuff over and over and
creating zillions of little buffers and objects. It is a little tricky to
clean up but probably just a 1-2 day project.

I would rather figure out that it is really the compression that is the
root cause rather than just our inefficiency before we do anything too
drastic design wise. If this is really killing you guys, and if that turns
out to be the cause, we would definitely take a patch to optimize that path

On Fri, Aug 2, 2013 at 4:55 PM, Chris Hogue <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB