Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Hadoop cluster network requirement


+
jonathan.hwang@... 2011-07-31, 19:08
+
Allen Wittenauer 2011-08-01, 02:16
+
Saqib Jang -- Margalla Co... 2011-08-01, 02:30
Copy link to this message
-
Re: Hadoop cluster network requirement

On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote:

> Thanks, I'm independently doing some digging into Hadoop networking
> requirements and
> had a couple of quick follow-ups. Could I have some specific info on why
> different data centers
> cannot be supported for master node and data node comms?
> Also, what
> may be the benefits/use cases for such a scenario?

Most people who try to put the NN and DNs in different data centers are trying to achieve disaster recovery:  one file system in multiple locations.  That isn't the way HDFS is designed and it will end in tears. There are multiple problems:

1) no guarantee that one block replica will be each data center (thereby defeating the whole purpose!)
2) assuming one can work out problem 1, during a network break, the NN will lose contact from one half of the  DNs, causing a massive network replication storm
3) if one using MR on top of this HDFS, the shuffle will likely kill the network in between (making MR performance pretty dreadful) is going to cause delays for the DN heartbeats
4) I don't even want to think about rebalancing.

... and I'm sure a lot of other problems I'm forgetting at the moment.  So don't do it.

If you want disaster recovery, set up two completely separate HDFSes and run everything in parallel.
+
Michael Segel 2011-08-01, 23:57
+
Mohit Anchlia 2011-08-02, 00:29
+
Allen Wittenauer 2011-08-03, 01:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB