Kafka, mail # dev - Question about offsets - 2013-01-30, 04:39
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
Question about offsets
Fiddling with my Python client on 0.8, noticed something has changed
with offsets.

It seems that instead of a byte offset in the log file, the offset is
now a logical one. I had a few questions about this:

1) How is the byte offset determined by the broker? Since messages are
not fixed width, does it use an index or do a simple binary search?
2) Regarding compressed MessageSets, how are the offsets incremented?
Suppose I have the following MessageSet

MessageSet A
  - Message A1, normal message
  - Message A2, normal message
  - Message A3, compressed MessageSet B
  - MessageSet B
     - Message B1
     - Message B2

Assuming we start from 0, message A1 gets offset of 0, A2 of 1. Now I am
unclear how the numbering goes. Would A3 get offset 3, and B1 -> 4 B2 ->
5? Or are the offsets inside a compressed MessageSet not used?

Is it possible to request a message inside a compressed message set?

Also what about nested compression sets, what if Message B3 is itself a
compressed MessageSet (not that it makes sense, just curious what would

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB