Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - File Channel issue - recovering from BadCheckpoint exception


Copy link to this message
-
Re: File Channel issue - recovering from BadCheckpoint exception
Hari Shreedharan 2013-05-31, 22:16
For now, how about making the auto-deletion configurable? If it is configured not to delete, then don't even try to startup the channel. This will bring in the pre-1.3.0 behavior where the channel's recovery is manual? I suspect you are going to hit many more issues when you enable dual checkpoints - and fixing that is going to be non-trivial.

Cheers,
Hari
On Friday, May 31, 2013 at 2:53 PM, Roshan Naik wrote:

> In EventQueueBackingStoreFileV3 constructor, if it detects that the
> checkpoint and meta files have differing logWriteOrderIds, it throws a
> BadCheckpointException. Controls goes back to the exception handler in
> Log.replay() which attempts to delete all the files in checkpoint directory
> and start fresh. The same file names are reused when starting fresh.
>
> Unfortunately this does not work on Windows since the deletion of
> the checkpoint file in the checkpointDir fails. The failure is due to the
> fact that the checkpoint file is memory mapped. Unless it is unmapped the
> deletion will not succeed... and unfortunately Java does not have unmap
> support. Windows does not permit deletion (or renaming) of files in use.
>
> The obvious thought i am having is that when starting fresh we delete
> whatever we can and invent a new file name for the ones we cant (i think
> for checkpoint file only)
>
> thoughts ?
>
> -roshan