Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Why inter-rack communication in mapreduce slow?


Copy link to this message
-
RE: Why inter-rack communication in mapreduce slow?

We've looked at the network problem for the past two years.

10GBe is now a reality. Even a year ago prices were still at a premium.
Because you now have 10GBe and you have 2TB drives at a price sweet spot, you really need to sit down and think out your cluster design.
You need to look at things in terms of power usage, density, cost and vendor relationships...

There's really more to this problem.

If you're looking at a simple answer, if you run 1 GBe TOR switches, buy a reliable switch that not only has better uplinks but allows you to bond ports to create a fatter pipe.
Arista and BladeNetworks (Now part of IBM) are producing some interesting switches and the prices aren't too bad. (Plus they claim to play nice with Cisco switches...)

If you're looking to build out a very large cluster... you really need to take a holistic approach.
HTH

-Mike

> Date: Tue, 7 Jun 2011 10:47:25 +0100
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Why inter-rack communication in mapreduce slow?
>
> On 06/06/2011 02:40 PM, John Armstrong wrote:
> > On Mon, 06 Jun 2011 09:34:56 -0400,<[EMAIL PROTECTED]>  wrote:
> >> Yeah, that's a good point.
>
> >
> > In fact, it almost makes me wonder if an ideal setup is not only to have
> > each of the main control daemons on their own nodes, but to put THOSE nodes
> > on their own rack and keep all the data elsewhere.
>
> I'd give them 10Gbps connection to the main network fabric, as with any
> ingress/egress nodes whose aim in life is to get data into and out of
> the cluster. There's a lot to be said for fast nodes within the
> datacentre but not hosting datanodes, as that way their writes get
> scattered everywhere -which is what you need when loading data into HDFS.
>
> You don't need separate racks for this, just more complicated wiring.
>
> -steve
>
> (disclaimer, my network knowledge generally stops at Connection Refused
> and No Route to Host messages)
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB