Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - CleanerChore exception


Copy link to this message
-
Re: CleanerChore exception
Jesse Yates 2012-12-31, 00:13
Hey,

So the point of all the delete code in the cleaner is to try and delete
each of the files in the directory and then delete the directory, assuming
its empty- it shouldn't leak the IOException if it the directory is found
to be empty and then gets a file added.

This is really odd though, as failures should return false, not throw an
exception (boo HDFS javadocs). Looking at the 0.94 and 0.96 code, it its
just logged, which it annoying, but doesn't mean broken code.

Otherwise, Jean-Marc's analysis looks right. Should be a simple fix. I
filed HBASE-7465 and should have a patch up shortly.

As an aside, this method is actually tested (in a somewhat roundabout way)
in TestCleanerChore#testCleanerDoesNotDeleteDirectoryWithLateAddedFiles
with a spy object that ensures we get this non-error case.

-Jesse
-------------------
Jesse Yates
@jesse_yates
jyates.github.com
On Sun, Dec 30, 2012 at 11:50 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> The Javadoc is saying:
>
> "@return <tt>true</tt> if the directory was deleted, <tt>false</tt>
> otherwise"
>
> So I think the line "return canDeleteThis ? fs.delete(toCheck, false)
> : false;" is still correct. It's retuning false if the directory has
> not been deleted.
>
> There is no exception here. If the TTL for a file had not expired, the
> file can't be deleted and false is returned. I think it's correct
> behaviour.
>
> The idea of not passing "true" for the recursivity is explained on the
> comments:
>     // if all the children have been deleted, then we should try to
> delete this directory. However,
>     // don't do so recursively so we don't delete files that have been
> added since we checked.
> And I think it's good. So the issue is really when the directory is
> empty and listStatus is sending back null. Then if (children == null)
> return true; is simply returning true without deleting the current
> directory.
>
> This should be changed by something like
> if (children == null) return fs.delete(toCheck, false);
> Which will try to delete the current directory, return true or false
> if possible or not, and throw an expection if there is any issue with
> the FS...
>
> I have done some modifications. I'm compiling and will deploy the
> updated version on my local cluster soon. I will keep you posted on
> the result.
>
> JM
>
> 2012/12/30, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> > Thanks for the confirmation.
> >
> > Also, seems that there is no test class related to
> > checkAndDeleteDirectory. It might be good to add that too.
> >
> > I have extracted 0.94.3 0.94.4RC0 and the trunk and they are all
> > identical for this methode.
> >
> > I will try to do some modifications and see the results...
> >
> > So far there is 2 options. One is to change the "return null" to
> > handle the current empty directory, and another one is to call
> > fs.delete() directly from checkAndDeleteDirectory instead of the
> > existing code.
> >
> > Will wait for Jesse's feedback.
> >
> > JM
> >
> > 2012/12/30, Ted Yu <[EMAIL PROTECTED]>:
> >> Thanks for the digging. This concurs with my suspicion in the beginning.
> >>
> >> I am copying Jesse who wrote the code. He should have more insight on
> >> this.
> >>
> >> After his confirmation, you can log a JIRA.
> >>
> >> Cheers
> >>
> >> On Sun, Dec 30, 2012 at 10:59 AM, Jean-Marc Spaggiari <
> >> [EMAIL PROTECTED]> wrote:
> >>
> >>> So. Looking deeper I found few things.
> >>>
> >>> First, why checkAndDeleteDirectory is not "simply" calling
> >>> FSUtils.delete (fs, toCheck, true)? I guess it's doing the same thing?
> >>>
> >>> Also, FSUtils.listStatus(fs, toCheck, null); will return null if there
> >>> is no status. Not just an empty array. And it's returning null, we
> >>> will exit without calling the delete methode.
> >>>
> >>> I tried to manually create a file on one of those directories. The
> >>> exception disapears for 300 seconds because of the TTL for the newly
> >>> created file. After 300 seconds, the file I pushed AND the directory