Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> File Channel issue - recovering from BadCheckpoint exception


Copy link to this message
-
Re: File Channel issue - recovering from BadCheckpoint exception
I am not sure who this is handled generally by Windows developers, but I'd assume there is a way to do that. I am fairly sure this is a known issue. I think the only thing we can do for now is to disable those unit tests if the build is on windows or have an if-else that tests the expected behavior on Windows. I don't really like having different behavior on Windows and posix platforms, but if the platform itself behaves in a specific way, I doubt there is anything we can do.  

In case of the dual checkpoints, we might be ok - because we actually don't open the files. We just create them and then copy the content and then close them.
Cheers,
Hari
On Friday, May 31, 2013 at 4:01 PM, Roshan Naik wrote:

> i am concerned several unit tests might be dependent on the auto-deletion.
>
>
> On Fri, May 31, 2013 at 3:57 PM, Hari Shreedharan <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])
> > wrote:
>
>
> > Roshan,
> >
> > No, that would break all config files from Flume 1.3.0 and Flume 1.3.1. We
> > should probably have some code that specifically disables this on Windows
> > and clearly document that.
> >
> >
> > Cheers,
> > Hari
> >
> >
> > On Friday, May 31, 2013 at 3:51 PM, Roshan Naik wrote:
> >
> > > Would it make sense for default config setting for the auto-deletion to
> > be
> > > set to 'false' then ?
> > >
> > >
> > > On Fri, May 31, 2013 at 3:16 PM, Hari Shreedharan <
> > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])
> > > > wrote:
> > >
> > >
> > >
> > > > For now, how about making the auto-deletion configurable? If it is
> > > > configured not to delete, then don't even try to startup the channel.
> > > >
> > >
> > >
> >
> > This
> > > > will bring in the pre-1.3.0 behavior where the channel's recovery is
> > > > manual? I suspect you are going to hit many more issues when you enable
> > > > dual checkpoints - and fixing that is going to be non-trivial.
> > > >
> > > > Cheers,
> > > > Hari
> > > >
> > > >
> > > > On Friday, May 31, 2013 at 2:53 PM, Roshan Naik wrote:
> > > >
> > > > > In EventQueueBackingStoreFileV3 constructor, if it detects that the
> > > > > checkpoint and meta files have differing logWriteOrderIds, it throws
> > > > >
> > > >
> > > >
> > >
> >
> > a
> > > > > BadCheckpointException. Controls goes back to the exception handler
> > > >
> > >
> >
> > in
> > > > > Log.replay() which attempts to delete all the files in checkpoint
> > > >
> > > >
> > > > directory
> > > > > and start fresh. The same file names are reused when starting fresh.
> > > > >
> > > > > Unfortunately this does not work on Windows since the deletion of
> > > > > the checkpoint file in the checkpointDir fails. The failure is due
> > > > >
> > > >
> > > >
> > >
> >
> > to the
> > > > > fact that the checkpoint file is memory mapped. Unless it is
> > > >
> > >
> >
> > unmapped the
> > > > > deletion will not succeed... and unfortunately Java does not have
> > > >
> > >
> >
> > unmap
> > > > > support. Windows does not permit deletion (or renaming) of files in
> > > >
> > >
> >
> > use.
> > > > >
> > > > > The obvious thought i am having is that when starting fresh we delete
> > > > > whatever we can and invent a new file name for the ones we cant (i
> > > > >
> > > >
> > >
> >
> > think
> > > > > for checkpoint file only)
> > > > >
> > > > > thoughts ?
> > > > >
> > > > > -roshan