-Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common
Steve Loughran 2013-05-24, 20:34
On 24 May 2013 01:28, Colin McCabe <[EMAIL PROTECTED]> wrote:
> You might try looking at what KosmoFS (KFS) did. They have some code in
> org/apache/hadoop/fs which calls their own Java shim.
> This way, the shim code in hadoop-common gets updated whenever FileSystem
> changes, but there is no requirement to install KFS before building Hadoop.
actually we were backing away from bundling that in there, the main issue
being inability to regression tests; it was code coming out from the ASF
marked as "part of hadoop" but we never knew what it did.
The s3 blobstore is in there; I think at some point it would be good to
pull out from the core hadoop-common JAR and put into its own
hadoop-tools/hadoop-aws JAR/project, so that its dependencies (jetS3t)
would be isolated from the main project, keeping transitive pom bloat down,
and allowing it to be a separate installable item.
S3 & Swift are testable, you just need money and/or donated cluster time
from the service providers; the same would hold for google cloud storage,
etc. They are on the net and I can test them from my laptop, even though
latency and bandwidth surface there (and on some of the swift services,
throttling of side-effecting operations, such as a recursive delete of a v.
large directory. That remote testing, therefore, helps me find such pains
before it hits the fueld.
> You might also try asking Steve Loughran, since he did some great work
> recently to try to nail down the exact semantics of FileSystem and
> FileContext and improve the related unit tests (see HADOOP-9258 and related
yeah, though I haven't written those tests yet. Plan is to pull most of the
HADOOP-8545 tests up, use Andrew Wang's wrapper code to make them work with
FileContext too, then add some class which every FS would implement; a
class that would provide a factory for filesystem/filecontext
implementations, and a Conf instance that declares FS capabilities:
has-umask, rmdir-root-test-safe-to-run, is-case-sensitive, max-path,
max-filename, ...). The (subclassed) test can use these values to skip
tests, and tune aspects.
I want to get the swift stuff in before the beta (its not going to have any
regressions, after all), get feedback on that, and, once the code is
checked in, start on pulling up the tests.