Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Question about offsets


Copy link to this message
-
Re: Question about offsets
Yes, offsets are now logical 0,1,2,3...

Some details on this change are here:
https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal
https://issues.apache.org/jira/browse/KAFKA-506
https://issues.apache.org/jira/browse/KAFKA-631

There was some previous threads on this that discussed the benefits. The
main goals included:
1. Allow key-based retention policy in addition to time-based retention.
This allows the topic to retain the latest value for each message key for
data streams that contain updates.
2. Fix the semantics of compressed messages (which are weird in 0.7 because
the compressed messages essentially have no offset and are unaddressable
hence commit() in the middle of a compressed message set will give
duplicates!).
3. Allow odd iteration patterns (skipping ahead N messages, reading
messages backwards, etc)

It is also just nicer for humans and less prone to weird errors.

To answer your questions:

1. Lookups are done using a memory mapped index file kept with the log that
maps logical offsets to byte offsets. This index is sparse (need not
contain every message). Once the begin and end position in the log segment
are found the data transfer happens the same as before.

2. The rule is that all messages have offsets stored with the messages.
These are assigned by the server. So in your example, the offsets of B1, B2
will be sequentially assigned and the wrapper message will have the offset
of the last message in the compressed set.

As a result in 0.8 requesting any offset in the compressed message set will
yield that compressed messageset (you need the whole thing to decompress).
This means the java client now works correctly with commit() and compressed
message sets--if you commit in the middle of a compressed message set and
restart from that point you will read the compressed set again, throw away
the messages with offset lower than your consumers offset, and then start
consuming each message after that.

Hope that makes sense.

We really need to get this stuff documented since it is all floating around
on various wikis and JIRAs.

-Jay
On Tue, Jan 29, 2013 at 8:38 PM, David Arthur <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB