|
|
-
hadoop FileSystem.close()
Koert Kuipers 2012-07-24, 14:34
Since FileSystem is a Closeable i would expect code using it to be like this:
FileSystem fs = path.getFileSystem(conf); try { // do something with fs, such as read from the path } finally { fs.close() }
However i have repeatedly gotten into trouble with this approach. In one situation it turned out that when i closed a FileSystem other operations that were using their own FileSystems (pointing to the same real-world HDFS filesystem) also saw their FileSystems closed, leading to very confusing read and write errors. This led me to believe that FileSystem should never be closed since it seemed to act like some sort of Singleton. However now was just looking at some code (Hoop server, to be precise) and noticed that FileSystems were indeed closed, but they were always threadlocal. Is this the right approach?
And if FileSystem is threadlocal, is this safe (assuming fs1 and fs2 could point to the same real-world filesystem)?
FileSystem fs1 = path.getFileSystem(conf); try { FileSystem fs2 = path.getFileSystem(conf); try { // do something with fs2, such as read from the path } finally { fs2.close() } // do something with fs1, such as read from the path (note, fs2 is closed here, and i wouldn't be surprised if fs1 by now is also closed given my experience) } finally { fs1.close() }
-
Re: hadoop FileSystem.close()
Edward Capriolo 2012-07-24, 14:46
In all my experience you let FileSystem instances close themselves.
On Tue, Jul 24, 2012 at 10:34 AM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > Since FileSystem is a Closeable i would expect code using it to be like > this: > > FileSystem fs = path.getFileSystem(conf); > try { > // do something with fs, such as read from the path > } finally { > fs.close() > } > > However i have repeatedly gotten into trouble with this approach. In one > situation it turned out that when i closed a FileSystem other operations > that were using their own FileSystems (pointing to the same real-world HDFS > filesystem) also saw their FileSystems closed, leading to very confusing > read and write errors. This led me to believe that FileSystem should never > be closed since it seemed to act like some sort of Singleton. However now > was just looking at some code (Hoop server, to be precise) and noticed that > FileSystems were indeed closed, but they were always threadlocal. Is this > the right approach? > > And if FileSystem is threadlocal, is this safe (assuming fs1 and fs2 could > point to the same real-world filesystem)? > > FileSystem fs1 = path.getFileSystem(conf); > try { > FileSystem fs2 = path.getFileSystem(conf); > try { > // do something with fs2, such as read from the path > } finally { > fs2.close() > } > // do something with fs1, such as read from the path (note, fs2 is > closed here, and i wouldn't be surprised if fs1 by now is also closed given > my experience) > } finally { > fs1.close() > }
-
Re: hadoop FileSystem.close()
Koert Kuipers 2012-07-24, 17:50
my suspicion is that fs.close() closes the FileSystem in the cache, regardless of whether if it is used by other processes as well at that point (as opposed to a system where the cache keeps a count of users and only closes it when the last user asks for a close). can anyone confirm?
although in principle there is nothing wrong this this setup, implementing Closeable in this situation is a bit misleading in my opinion.
On Tue, Jul 24, 2012 at 10:46 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> In all my experience you let FileSystem instances close themselves. > > On Tue, Jul 24, 2012 at 10:34 AM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > > Since FileSystem is a Closeable i would expect code using it to be like > > this: > > > > FileSystem fs = path.getFileSystem(conf); > > try { > > // do something with fs, such as read from the path > > } finally { > > fs.close() > > } > > > > However i have repeatedly gotten into trouble with this approach. In one > > situation it turned out that when i closed a FileSystem other operations > > that were using their own FileSystems (pointing to the same real-world > HDFS > > filesystem) also saw their FileSystems closed, leading to very confusing > > read and write errors. This led me to believe that FileSystem should > never > > be closed since it seemed to act like some sort of Singleton. However now > > was just looking at some code (Hoop server, to be precise) and noticed > that > > FileSystems were indeed closed, but they were always threadlocal. Is this > > the right approach? > > > > And if FileSystem is threadlocal, is this safe (assuming fs1 and fs2 > could > > point to the same real-world filesystem)? > > > > FileSystem fs1 = path.getFileSystem(conf); > > try { > > FileSystem fs2 = path.getFileSystem(conf); > > try { > > // do something with fs2, such as read from the path > > } finally { > > fs2.close() > > } > > // do something with fs1, such as read from the path (note, fs2 is > > closed here, and i wouldn't be surprised if fs1 by now is also closed > given > > my experience) > > } finally { > > fs1.close() > > } >
|
|