Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Troubles with compressed message set

Copy link to this message
Re: Troubles with compressed message set

On 1/30/13 11:18 AM, Jay Kreps wrote:
> Hi David,
> MessageSets aren't size delimited because that format is shared with
> producer/consumer and the kafka log itself. The log itself is just a big
> sequence of messages and any subset that begins and ends at valid message
> boundaries is a valid message set. This means that message sets are size
> prefixed only as part of the request/response. Not sure if that is what you
> are asking?
Other repeated elements in the protocol are preceded by a int32 to
indicate the number of repeated elements. E.g., topicCount in

MessageSet does not do this, it just starts writing out the messages
To be more like the rest of the protocol, it would need to first do
something like: buffer.putInt(messages.size)

It's not really a big deal either way, but I thought it important to
point out since it's different than the rest of the protocol and it
tripped me up.

> It's hard to see the cause of the error you describe. I don't suppose you
> could send me a snapshot of your client to reproduce locally?
> -Jay
> On Wed, Jan 30, 2013 at 7:26 AM, David Arthur <[EMAIL PROTECTED]> wrote:
>> I'm working on a client and I'm running into difficulties with compressed
>> message sets. I am able to produce them fine, but when I go to fetch them,
>> things seem strange.
>> I am sending a message who's value is a compressed message set. The inner
>> message set contains a single message. Specifically what looks weird is
>> that the key of the top message looks corrupt. Here is a trace of my
>> payloads:
>> https://gist.github.com/**bf134906f6559b0f54ad<https://gist.github.com/bf134906f6559b0f54ad>
>> See the "???" down in the FetchResponse for what I mean. Also the magic
>> byte and attributes are wrong
>> The data in the Kafka log for this partition matches what I get back for
>> the MessageSet in the FetchResponse:
>> \x00\x00\x00\x00\x00\x00\x00\**x00\x00\x00\x00;\xf5#\xc2N\**
>> x00\x01\xff\xff\xff\xff\x00\**x00\x00-\x1f\x8b\x08\x00\x00\**
>> x00\x00\x00\x00\x00c`\x80\x03\**x89P\xf7\xef\xccL
>> \x16sZ~>\x90bw\x8f\xf2\x0c\**x08HM\x01\x00\xc5\x93\xd3<$\**x00\x00\x00
>> Another bit of weirdness here is how MessageSets are encoded. Everywhere
>> else in the API, we prefix a repeated element with a size of int32. When
>> encoding MessageSets, if I follow this convention, Kafka rejects the
>> produce request - if I exclude that int32 it works fine. I don't know if
>> this was intentional or not, but it is somewhat annoying and inconsistent.
>> When decoding MessageSets, I have to do-while instead of iterate a known
>> number of times.
>> Cheers
>> -David