Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> HighwaterMarkCheckpoint

Copy link to this message
Re: HighwaterMarkCheckpoint
Does this capture the intention of this code:

The idea is to rename it to OffsetCheckpoint so it is generic. To do this
1. I make the file name and argument
2. read now returns a Map of all values. To get the old behavior you would
do chkpt.read.getOrElse(TopicPartition(t,p), 0L). This makes sense as
defaulting to 0L may only make sense for highwatermarks.
3. Throw and exception rather than calling shutdown (always a good idea)

Some things I am confused about:
1. Parsing logic--there is very complex being done but I don't know why.
Are we just parsing three space-separated values? Why all the indexOf stuff?
2. Why do we store the number of entries in the file?


On Mon, Nov 26, 2012 at 4:14 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> The api for the HighwaterMarkCheckpoint is:
>   def write(highwaterMarksPerPartition: Map[TopicAndPartition, Long])
>   def read(topic: String, partition: Int): Long
> Pretty weird, no? The write method writes a map of all values,
> obliterating previous values. The read method internally reads back that
> whole map, then throws away all the values but one and returns that one. It
> seems like the natural way to do it would be to either read and write one
> key/value pair or read and write them all together.
> Is there a rationale for this? Can we change it?
> It would be nice to reuse this class to track log flushes, and the cleaner
> point. However in its current form it isn't particularly useful.
> -Jay