Thank you Niels and thank you Nitin for your reply.
Actually, I want to run MR on a cloud store, which is open source. So I thought of implementing a file system for the same and plugging it into Hadoop, just like S3/KFS are there. This would enable a hadoop client to talk to "My cloud store". But I do not have further clarity as to how to run MR on the cloud using the JobTracker/TaskTracker framework of Hadoop.
As per the link given by Niels, it shows that I can run MR on local file system. So is there any way of telling the JobTracker to read data from a set of nodes and then deploy TaskTracker daemons on those nodes (which would be "My cloud store" in this case) and fetch the result of MR.
Note: I do not want to fetch the data to my local computer as is the case with S3. Fetching the data would fail the purpose of using Hadoop (which is moving compute to data).
From: Agarwal, Nikhil
Sent: Sunday, February 17, 2013 11:53 AM
To: '[EMAIL PROTECTED]'
Subject: Can I perfrom a MR on my local filesystem
Recently I followed a blog to run Hadoop on a single node cluster.
I wanted to ask that in a single node set-up of Hadoop is it necessary to have the data copied into Hadoop's HDFS before running a MR on it. Can I run MR on my local file system too without copying the data to HDFS?
In the Hadoop source code I saw there are implementations of other file systems too like S3, KFS, FTP, etc. so how does exactly a MR happen on S3 data store ? How does JobTracker or Tasktracker run in S3 ?
I would be very thankful to get a reply to this.
Thanks & Regards,