Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Question about offsets


Copy link to this message
-
Question about offsets
Fiddling with my Python client on 0.8, noticed something has changed
with offsets.

It seems that instead of a byte offset in the log file, the offset is
now a logical one. I had a few questions about this:

1) How is the byte offset determined by the broker? Since messages are
not fixed width, does it use an index or do a simple binary search?
2) Regarding compressed MessageSets, how are the offsets incremented?
Suppose I have the following MessageSet

MessageSet A
  - Message A1, normal message
  - Message A2, normal message
  - Message A3, compressed MessageSet B
  - MessageSet B
     - Message B1
     - Message B2

Assuming we start from 0, message A1 gets offset of 0, A2 of 1. Now I am
unclear how the numbering goes. Would A3 get offset 3, and B1 -> 4 B2 ->
5? Or are the offsets inside a compressed MessageSet not used?

Is it possible to request a message inside a compressed message set?

Also what about nested compression sets, what if Message B3 is itself a
compressed MessageSet (not that it makes sense, just curious what would
happen).

Thanks!
-David