Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # dev - HighwaterMarkCheckpoint


+
Jay Kreps 2012-11-27, 00:14
Copy link to this message
-
Re: HighwaterMarkCheckpoint
Jay Kreps 2012-11-27, 01:17
Does this capture the intention of this code:
https://gist.github.com/4151743
?

The idea is to rename it to OffsetCheckpoint so it is generic. To do this
1. I make the file name and argument
2. read now returns a Map of all values. To get the old behavior you would
do chkpt.read.getOrElse(TopicPartition(t,p), 0L). This makes sense as
defaulting to 0L may only make sense for highwatermarks.
3. Throw and exception rather than calling shutdown (always a good idea)

Some things I am confused about:
1. Parsing logic--there is very complex being done but I don't know why.
Are we just parsing three space-separated values? Why all the indexOf stuff?
2. Why do we store the number of entries in the file?

Cheers,

-Jay
On Mon, Nov 26, 2012 at 4:14 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> The api for the HighwaterMarkCheckpoint is:
>
>   def write(highwaterMarksPerPartition: Map[TopicAndPartition, Long])
>
>   def read(topic: String, partition: Int): Long
>
> Pretty weird, no? The write method writes a map of all values,
> obliterating previous values. The read method internally reads back that
> whole map, then throws away all the values but one and returns that one. It
> seems like the natural way to do it would be to either read and write one
> key/value pair or read and write them all together.
>
> Is there a rationale for this? Can we change it?
>
> It would be nice to reuse this class to track log flushes, and the cleaner
> point. However in its current form it isn't particularly useful.
>
> -Jay
>
+
Jay Kreps 2012-11-27, 01:20