Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> TaskTrackers behind NAT


Copy link to this message
-
Re: TaskTrackers behind NAT

On Jul 18, 2011, at 12:53 PM, Ben Clay wrote:

> I'd like to spread Hadoop across two physical clusters, one which is
> publicly accessible and the other which is behind a NAT. The NAT'd machines
> will only run TaskTrackers, not HDFS, and not Reducers either (configured
> with 0 Reduce slots).  The master node will run in the publicly-available
> cluster.

Off the top, I doubt it will work : MR is bi-directional, across many random ports.  So I would suspect there is going to be a lot of hackiness in the network config to make this work.

> 1. Port 50060 needs to be opened for all NAT'd machines, since Reduce tasks
> fetch intermediate data from http://
> <http://%3ctasktracker%3e:50060/mapOutput> <tasktracker>:50060/mapOutput,
> correct ?  I'm getting "Too many fetch-failures" with no open ports, so I
> assume the Reduce tasks need to pull the intermediate data instead of Map
> tasks pushing it.

Correct. Reduce tasks pull.

> 2. Although the NAT'd machines have unique IPs and reach the outside, the
> DHCP is not assigning them hostnames.  Therefore, when they join the
> JobTracker I get
> "tracker_localhost.localdomain:localhost.localdomain/127.0.0.1" on the
> machine list page.  Is there some way to force Hadoop to refer to them via
> IP instead of hostname, since I don't have control over the DHCP? I could
> manually assign a hostname via /etc/hosts on each NAT'd machine, but these
> are actually VMs and I will have many of them receiving semi-random IPs,
> making this an ugly administrative task.
Short answer: no.

Long answer: no, fix your DHCP and/or do the /etc/hosts hack.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB