I am currently look into some other filesystem implementation, like
lustre, gluster, or other NAS with POSIX support, and trying to replace
HDFS with it.
I have implement a filesystem class( AFS) which will provide interface
to Hadoop MapReduce, like the one of RawLocalFileSystem, and examples like
wordcount, terasort works well.
However, I am not sure whether my implementation is correct for all the
MapReduce applications that Hadoop MapReduce+Hadoop HDFS can run.
My question is :
1. How Hadoop community do MapReduce regression test for any update of
Hadoop HDFS and Hadoop MapReduce
2. Beside MapReduce wordcount and Terasort examples, are there any missing
filesystem interface support for MapReduce application. Since the
FileSystem has POSIX support, the hsync have also supported.
3. According to my test, the performance is worse than the HDFS+MapReduce.
Any suggestion or hint on the performance analysis? ( Without MapReduce,
the performance of the filesystem is better than HDFS and also local
3.1 the following are the same for the performance comparation:
3.1.1 architecture: 4 node for MR, and another different 4 nodes for
3.1.2 application: the input size , the number of mapper and reducers are