-RE: How to add another file system in Hadoop
Agarwal, Nikhil 2013-02-22, 05:05
Thanks a lot for taking out your time to answer my question.
Ling, thank you for directing me to glusterfs. I can surely take lot of help from that but what I wanted to know is that in README.txt it is mentioned :
>> # ./bin/start-mapred.sh
If the map/reduce job/task trackers are up, all I/O will be done to GlusterFS.
So, suppose my input files are scattered in different nodes(glusterfs servers), how do I(hadoop client having glusterfs plugged in) issue a Mapreduce command?
Moreover, after issuing a Mapreduce command would my hadoop client fetch all the data from different servers to my local machine and then do a Mapreduce or would it start the TaskTracker daemons on the machine(s) where the input file(s) are located and perform a Mapreduce there?
Please rectify me if I am wrong but I suppose that the location of input files top Mapreduce is being returned by the function getFileBlockLocations (FileStatus file, long start, long len).
Thank you very much for your time and helping me out.
From: Agarwal, Nikhil
Sent: Thursday, February 21, 2013 4:19 PM
To: '[EMAIL PROTECTED]'
Subject: How to add another file system in Hadoop
I am planning to add a file system called CDMI under org.apache.hadoop.fs in Hadoop, something similar to KFS or S3 which are already there under org.apache.hadoop.fs. I wanted to ask that say, I write my file system for CDMI and add the package under fs but then how do I tell the core-site.xml or other configuration files to use CDMI file system. Where all do I need to make changes to enable CDMI file system become a part of Hadoop ?
Thanks a lot in advance.