Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Can I perfrom a MR on my local filesystem


Copy link to this message
-
Re: Can I perfrom a MR on my local filesystem
Hi Nikhil,

The jobtracker doesn't do any deployment of other daemons.  They are
expected to be installed and started on other nodes separately.

If I understand your question more broadly, MR doesn't necessarily run its
map and reduce tasks on the nodes that contain the data.  All data is read
from the FileSystem interface, which, depending on the file system, likely
pulls from remote machines.  In situations where MR is run on the same
cluster that the data is stored (as in HDFS, but not S3, as far as I know),
some effort is made to run map tasks on the nodes that contain their
inputs, but this is best-effort, and controlled by the scheduler.

I hope that helps
-Sandy

On Sun, Feb 17, 2013 at 3:18 AM, Agarwal, Nikhil
<[EMAIL PROTECTED]>wrote:

>  Hi,****
>
> ** **
>
> Thank you Niels and thank you Nitin for your reply. ****
>
> ** **
>
> Actually, I want to run MR on a cloud store, which is open source. So I
> thought of implementing  a file system for the same and plugging it into
> Hadoop, just like S3/KFS are there. This would enable a hadoop client to
> talk to “My cloud store”. But I do not have further clarity as to how to
> run MR on the cloud using the JobTracker/TaskTracker framework of Hadoop.
> ****
>
> ** **
>
> As per the link given by Niels, it shows that I can run MR on local file
> system. So is there any way of telling the JobTracker to read data from a
> set of nodes and then deploy TaskTracker daemons on those nodes (which
> would be “My cloud store” in this case) and fetch the result of MR. ** **
>
> ** **
>
> Note: I do not want to fetch the data to my local computer as is the case
> with S3. Fetching the data would fail the purpose of using Hadoop (which is
> moving compute to data).****
>
> ** **
>
> Thanks,****
>
> Nikhil****
>
> ** **
>
> *From:* Agarwal, Nikhil
> *Sent:* Sunday, February 17, 2013 11:53 AM
> *To:* '[EMAIL PROTECTED]'
> *Subject:* Can I perfrom a MR on my local filesystem****
>
> ** **
>
> Hi,****
>
> Recently I followed a blog to run Hadoop on a single node cluster.****
>
> I wanted to ask that in a single node set-up of Hadoop is it necessary to
> have the data copied into Hadoop’s HDFS before running a MR on it. Can I
> run MR on my local file system too without copying the data to HDFS? ****
>
> In the Hadoop source code I saw there are implementations of other file
> systems too like S3, KFS, FTP, etc. so how does exactly a MR happen on S3
> data store ? How does JobTracker or Tasktracker run in S3 ? ****
>
> ** **
>
> I would be very thankful to get a reply to this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB