Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> File Channel issue - recovering from BadCheckpoint exception


Copy link to this message
-
Re: File Channel issue - recovering from BadCheckpoint exception
I think we could add JUnit Assume statements for any tests which depend on
this value since it will be auto disabled on windows.
On Fri, May 31, 2013 at 6:15 PM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

> I am not sure who this is handled generally by Windows developers, but I'd
> assume there is a way to do that. I am fairly sure this is a known issue. I
> think the only thing we can do for now is to disable those unit tests if
> the build is on windows or have an if-else that tests the expected behavior
> on Windows. I don't really like having different behavior on Windows and
> posix platforms, but if the platform itself behaves in a specific way, I
> doubt there is anything we can do.
>
> In case of the dual checkpoints, we might be ok - because we actually
> don't open the files. We just create them and then copy the content and
> then close them.
>
>
> Cheers,
> Hari
>
>
> On Friday, May 31, 2013 at 4:01 PM, Roshan Naik wrote:
>
> > i am concerned several unit tests might be dependent on the
> auto-deletion.
> >
> >
> > On Fri, May 31, 2013 at 3:57 PM, Hari Shreedharan <
> [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])
> > > wrote:
> >
> >
> > > Roshan,
> > >
> > > No, that would break all config files from Flume 1.3.0 and Flume
> 1.3.1. We
> > > should probably have some code that specifically disables this on
> Windows
> > > and clearly document that.
> > >
> > >
> > > Cheers,
> > > Hari
> > >
> > >
> > > On Friday, May 31, 2013 at 3:51 PM, Roshan Naik wrote:
> > >
> > > > Would it make sense for default config setting for the auto-deletion
> to
> > > be
> > > > set to 'false' then ?
> > > >
> > > >
> > > > On Fri, May 31, 2013 at 3:16 PM, Hari Shreedharan <
> > > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])
> > > > > wrote:
> > > >
> > > >
> > > >
> > > > > For now, how about making the auto-deletion configurable? If it is
> > > > > configured not to delete, then don't even try to startup the
> channel.
> > > > >
> > > >
> > > >
> > >
> > > This
> > > > > will bring in the pre-1.3.0 behavior where the channel's recovery
> is
> > > > > manual? I suspect you are going to hit many more issues when you
> enable
> > > > > dual checkpoints - and fixing that is going to be non-trivial.
> > > > >
> > > > > Cheers,
> > > > > Hari
> > > > >
> > > > >
> > > > > On Friday, May 31, 2013 at 2:53 PM, Roshan Naik wrote:
> > > > >
> > > > > > In EventQueueBackingStoreFileV3 constructor, if it detects that
> the
> > > > > > checkpoint and meta files have differing logWriteOrderIds, it
> throws
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > > a
> > > > > > BadCheckpointException. Controls goes back to the exception
> handler
> > > > >
> > > >
> > >
> > > in
> > > > > > Log.replay() which attempts to delete all the files in checkpoint
> > > > >
> > > > >
> > > > > directory
> > > > > > and start fresh. The same file names are reused when starting
> fresh.
> > > > > >
> > > > > > Unfortunately this does not work on Windows since the deletion of
> > > > > > the checkpoint file in the checkpointDir fails. The failure is
> due
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > > to the
> > > > > > fact that the checkpoint file is memory mapped. Unless it is
> > > > >
> > > >
> > >
> > > unmapped the
> > > > > > deletion will not succeed... and unfortunately Java does not have
> > > > >
> > > >
> > >
> > > unmap
> > > > > > support. Windows does not permit deletion (or renaming) of files
> in
> > > > >
> > > >
> > >
> > > use.
> > > > > >
> > > > > > The obvious thought i am having is that when starting fresh we
> delete
> > > > > > whatever we can and invent a new file name for the ones we cant
> (i
> > > > > >
> > > > >
> > > >
> > >
> > > think
> > > > > > for checkpoint file only)
> > > > > >
> > > > > > thoughts ?
> > > > > >
> > > > > > -roshan
>
>
--
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB