The api for the HighwaterMarkCheckpoint is:
def write(highwaterMarksPerPartition: Map[TopicAndPartition, Long])
def read(topic: String, partition: Int): Long
Pretty weird, no? The write method writes a map of all values, obliterating
previous values. The read method internally reads back that whole map, then
throws away all the values but one and returns that one. It seems like the
natural way to do it would be to either read and write one key/value pair
or read and write them all together.
Is there a rationale for this? Can we change it?
It would be nice to reuse this class to track log flushes, and the cleaner
point. However in its current form it isn't particularly useful.
Jay Kreps 2012-11-27, 01:17
Jay Kreps 2012-11-27, 01:20