Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> File Channel issue - recovering from BadCheckpoint exception

Roshan Naik 2013-05-31, 21:53
Copy link to this message
Re: File Channel issue - recovering from BadCheckpoint exception
For now, how about making the auto-deletion configurable? If it is configured not to delete, then don't even try to startup the channel. This will bring in the pre-1.3.0 behavior where the channel's recovery is manual? I suspect you are going to hit many more issues when you enable dual checkpoints - and fixing that is going to be non-trivial.

On Friday, May 31, 2013 at 2:53 PM, Roshan Naik wrote:

> In EventQueueBackingStoreFileV3 constructor, if it detects that the
> checkpoint and meta files have differing logWriteOrderIds, it throws a
> BadCheckpointException. Controls goes back to the exception handler in
> Log.replay() which attempts to delete all the files in checkpoint directory
> and start fresh. The same file names are reused when starting fresh.
> Unfortunately this does not work on Windows since the deletion of
> the checkpoint file in the checkpointDir fails. The failure is due to the
> fact that the checkpoint file is memory mapped. Unless it is unmapped the
> deletion will not succeed... and unfortunately Java does not have unmap
> support. Windows does not permit deletion (or renaming) of files in use.
> The obvious thought i am having is that when starting fresh we delete
> whatever we can and invent a new file name for the ones we cant (i think
> for checkpoint file only)
> thoughts ?
> -roshan

Roshan Naik 2013-05-31, 22:51
Hari Shreedharan 2013-05-31, 22:57
Roshan Naik 2013-05-31, 23:01
Hari Shreedharan 2013-05-31, 23:15
Brock Noland 2013-06-01, 02:08