Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> the cleaner and log segments

Copy link to this message
Re: the cleaner and log segments
Was that "write an empty log segment" feature always there?

On 11/18/2011 06:39 PM, Joel Koshy wrote:
> Just want to see if I understand this right - when the log cleaner
> does its thing, even if all the segments are eligible for garbage
> collection the cleaner will nuke those files and should deposit an
> empty segment file named with the next valid offset in that partition.
> I think Taylor encountered a case where that empty segment was not
> added. Is this the race condition that you speak of? If for e.g., the
> broker crashes before that empty segment file is created...
> Also, I have seen the log cleaner act up more than once in the past -
> basically seems to get scheduled continuously and delete file 0000...
> I think someone else on the list saw that before. I have been unable
> to reproduce that though - and it is not impossible that there was a
> misconfiguration at play.
> Thanks,
> Joel
> On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote:
>> Ok that's what we are already doing.  In essence when that happens it
>> is a bit like a rollover. Except depending on the values it might be
>> the case that a consumer has a low enough value that web it requests
>> the topic the value is still within range but is not valid since
>> messages were delivered to the broker. Essentially it's a race
>> condition that might be somewhat hard to induce but is theoretically
>> possible. During a rollover of 64-bits this is more or less never
>> going to happen because 64-bits is just too large to open a time
>> window long enough for the race to occur.
>> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>> Taylor,
>>> If you request an offset whose corresponding log file has been deleted, you
>>> will get OutOfRange exception. When this happens, you can use the
>>> getLatestOffset api in SimpleConsumer to obtain either the current valid
>>> smallest or largest offset and reconsume from there. Our high level
>>> consumer does that for you (among many other things). That's why we
>>> encourage most users to use the high level api instead.
>>> Thanks,
>>> Jun