Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> HighwaterMarkCheckpoint

Jay Kreps 2012-11-27, 00:14
Jay Kreps 2012-11-27, 01:17
Copy link to this message
Re: HighwaterMarkCheckpoint
Also I don't see any unit test coverage for this class directly, is that
right? If so I will add it...

I do see two big integration tests HighwatermarkPersistenceTest and
LogRecoveryTest both use it. What is the difference between these?

On Mon, Nov 26, 2012 at 5:17 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> Does this capture the intention of this code:
> https://gist.github.com/4151743
> ?
> The idea is to rename it to OffsetCheckpoint so it is generic. To do this
> 1. I make the file name and argument
> 2. read now returns a Map of all values. To get the old behavior you would
> do chkpt.read.getOrElse(TopicPartition(t,p), 0L). This makes sense as
> defaulting to 0L may only make sense for highwatermarks.
> 3. Throw and exception rather than calling shutdown (always a good idea)
> Some things I am confused about:
> 1. Parsing logic--there is very complex being done but I don't know why.
> Are we just parsing three space-separated values? Why all the indexOf stuff?
> 2. Why do we store the number of entries in the file?
> Cheers,
> -Jay
> On Mon, Nov 26, 2012 at 4:14 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>> The api for the HighwaterMarkCheckpoint is:
>>   def write(highwaterMarksPerPartition: Map[TopicAndPartition, Long])
>>   def read(topic: String, partition: Int): Long
>> Pretty weird, no? The write method writes a map of all values,
>> obliterating previous values. The read method internally reads back that
>> whole map, then throws away all the values but one and returns that one. It
>> seems like the natural way to do it would be to either read and write one
>> key/value pair or read and write them all together.
>> Is there a rationale for this? Can we change it?
>> It would be nice to reuse this class to track log flushes, and the
>> cleaner point. However in its current form it isn't particularly useful.
>> -Jay