Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Troubles with compressed message set


+
David Arthur 2013-01-30, 15:27
+
Jay Kreps 2013-01-30, 16:19
+
David Arthur 2013-01-30, 21:00
+
Jay Kreps 2013-01-30, 21:17
+
David Arthur 2013-01-30, 21:23
Copy link to this message
-
Re: Troubles with compressed message set

On 1/30/13 11:18 AM, Jay Kreps wrote:
> Hi David,
>
> MessageSets aren't size delimited because that format is shared with
> producer/consumer and the kafka log itself. The log itself is just a big
> sequence of messages and any subset that begins and ends at valid message
> boundaries is a valid message set. This means that message sets are size
> prefixed only as part of the request/response. Not sure if that is what you
> are asking?
Other repeated elements in the protocol are preceded by a int32 to
indicate the number of repeated elements. E.g., topicCount in
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/api/ProducerRequest.scala#L38

MessageSet does not do this, it just starts writing out the messages
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/message/ByteBufferMessageSet.scala#L35.
To be more like the rest of the protocol, it would need to first do
something like: buffer.putInt(messages.size)

It's not really a big deal either way, but I thought it important to
point out since it's different than the rest of the protocol and it
tripped me up.

-David
>
> It's hard to see the cause of the error you describe. I don't suppose you
> could send me a snapshot of your client to reproduce locally?
>
> -Jay
>
>
>
>
> On Wed, Jan 30, 2013 at 7:26 AM, David Arthur <[EMAIL PROTECTED]> wrote:
>
>> I'm working on a client and I'm running into difficulties with compressed
>> message sets. I am able to produce them fine, but when I go to fetch them,
>> things seem strange.
>>
>> I am sending a message who's value is a compressed message set. The inner
>> message set contains a single message. Specifically what looks weird is
>> that the key of the top message looks corrupt. Here is a trace of my
>> payloads:
>>
>> https://gist.github.com/**bf134906f6559b0f54ad<https://gist.github.com/bf134906f6559b0f54ad>
>>
>> See the "???" down in the FetchResponse for what I mean. Also the magic
>> byte and attributes are wrong
>>
>> The data in the Kafka log for this partition matches what I get back for
>> the MessageSet in the FetchResponse:
>>
>> \x00\x00\x00\x00\x00\x00\x00\**x00\x00\x00\x00;\xf5#\xc2N\**
>> x00\x01\xff\xff\xff\xff\x00\**x00\x00-\x1f\x8b\x08\x00\x00\**
>> x00\x00\x00\x00\x00c`\x80\x03\**x89P\xf7\xef\xccL
>> \x16sZ~>\x90bw\x8f\xf2\x0c\**x08HM\x01\x00\xc5\x93\xd3<$\**x00\x00\x00
>>
>> Another bit of weirdness here is how MessageSets are encoded. Everywhere
>> else in the API, we prefix a repeated element with a size of int32. When
>> encoding MessageSets, if I follow this convention, Kafka rejects the
>> produce request - if I exclude that int32 it works fine. I don't know if
>> this was intentional or not, but it is somewhat annoying and inconsistent.
>> When decoding MessageSets, I have to do-while instead of iterate a known
>> number of times.
>>
>> Cheers
>> -David
>>
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB