Recently I followed a blog to run Hadoop on a single node cluster.
I wanted to ask that in a single node set-up of Hadoop is it necessary to have the data copied into Hadoop's HDFS before running a MR on it. Can I run MR on my local file system too without copying the data to HDFS?
In the Hadoop source code I saw there are implementations of other file systems too like S3, KFS, FTP, etc. so how does exactly a MR happen on S3 data store ? How does JobTracker or Tasktracker run in S3 ?
I would be very thankful to get a reply to this.
Thanks & Regards,