Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - the cleaner and log segments


Copy link to this message
-
Re: the cleaner and log segments
Chris Burroughs 2011-11-23, 16:22
Was that "write an empty log segment" feature always there?

On 11/18/2011 06:39 PM, Joel Koshy wrote:
> Just want to see if I understand this right - when the log cleaner
> does its thing, even if all the segments are eligible for garbage
> collection the cleaner will nuke those files and should deposit an
> empty segment file named with the next valid offset in that partition.
> I think Taylor encountered a case where that empty segment was not
> added. Is this the race condition that you speak of? If for e.g., the
> broker crashes before that empty segment file is created...
>
> Also, I have seen the log cleaner act up more than once in the past -
> basically seems to get scheduled continuously and delete file 0000...
> I think someone else on the list saw that before. I have been unable
> to reproduce that though - and it is not impossible that there was a
> misconfiguration at play.
>
> Thanks,
>
> Joel
>
> On Fri, Nov 18, 2011 at 11:50 AM, Taylor Gautier <[EMAIL PROTECTED]> wrote:
>> Ok that's what we are already doing.  In essence when that happens it
>> is a bit like a rollover. Except depending on the values it might be
>> the case that a consumer has a low enough value that web it requests
>> the topic the value is still within range but is not valid since
>> messages were delivered to the broker. Essentially it's a race
>> condition that might be somewhat hard to induce but is theoretically
>> possible. During a rollover of 64-bits this is more or less never
>> going to happen because 64-bits is just too large to open a time
>> window long enough for the race to occur.
>>
>>
>>
>> On Nov 18, 2011, at 10:32 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>
>>> Taylor,
>>>
>>> If you request an offset whose corresponding log file has been deleted, you
>>> will get OutOfRange exception. When this happens, you can use the
>>> getLatestOffset api in SimpleConsumer to obtain either the current valid
>>> smallest or largest offset and reconsume from there. Our high level
>>> consumer does that for you (among many other things). That's why we
>>> encourage most users to use the high level api instead.
>>>
>>> Thanks,
>>>
>>> Jun
>>