Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Cluster hard drive ratios

Copy link to this message
RE: Cluster hard drive ratios


Sorry my math is off. I keep thinking in terms of TB per core and not drives. :-)

To be honest I don't know if I would recommend 6 core cpus.

We're running on what is now considered 'old hardware' (Intel Xeon e5500 series) .
Yes we saw that w 8 cores and 4 drives, we were limited by the # of drives.
Pushing that up to 8 or 12 drives would mean that the disks would be less of a bottleneck.

But then you're looking at memory.
Which is also a limiting factor...
You're going to have to look at 10GBe. And then your ToR is going to be an issue.
Not all hardware vendors (networking) are equal. You'll want to make sure that your trunk between racks is more than 10GBe if all of your ports are running 10GBe.

Everybody has an opinion on this. Outside of Facebook and Yahoo! I don't know of anyone who is really running large clouds and is willing to talk about it.
> Date: Wed, 4 May 2011 13:59:38 -0500
> Subject: RE: Cluster hard drive ratios
> Mike,
> Thanks for the response. It looks like this discussion forked on the CDH
> list so I have two different conversations now. Also, you're dead on
> that one of the presentations I was referencing was Ravi's.
> With your setup I agree that it would have made no sense to go the 2.5"
> drive route given it would have forced you into the 500-750GB SATA
> drives and all it would allow is more spindles but less capacity at a
> higher cost. The servers we have been considering are actually the
> R710's so dual hexacore with 12 spindles of actual capacity is more of a
> 1:1 in terms of cores to spindles vs the 2:1 I have been reviewing. My
> original issue attempted to focus more around at what point do you
> actually see a plateau in write performance of cores:spindles but since
> we are headed that direction anyway it looks like it was more to sate
> curiosity than driving specifications.
> As to your point, I forgot to include the issue of rebalancing in the
> original email but you are absolutely right. That was another major
> concern especially as we would get closer to filling capacity of a 24TB
> box. I think the original plan was bonded GBe but I think our
> infrastructure team has told us 10GBe would be standard.
> Matt
> -----Original Message-----
> From: Michael Segel [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, May 04, 2011 1:26 PM
> Subject: RE: Cluster hard drive ratios
> Hi Matt.
> I think you attended Ravi's presentation....
> One of the reasons we used 4 drives per node is that our nodes are in 1U
> boxes and you can only fit 4 3.5" SATA drives in those boxes. Could we
> have gone for more drives using 2.5" SATA drives? Yes, but then you will
> reduce the amount of disk per node and you would increase your cost per
> node.
> Looking at newer boxes. (C Series from Dell which didn't exist when we
> started...) 12 drives would be 2 drives per core if you went with 6
> cores, or 3 drives per core if you went with 4 core cpus.
> The issue raised here is that using 2TB drives, you now have 24TB of
> disk per node.
> So if you lost a node, that's a lot of background replication occurring.
> IMHO, this would be less of an issue if you went with 10GBe (Solarflare
> cards, which Dell is a reseller) and then a good 10GBe ToR.
> I haven't tried this configuration, so I don't know how well it would
> perform.
> My guess with 10GBe, you'd be ok...
> -Mike
> ----------------------------------------
> > Date: Wed, 4 May 2011 08:43:33 -0700
> > Subject: Cluster hard drive ratios
> >
> > I have been reviewing quite a few presentations on the web from
> > various businesses, in addition to the ones I watched first hand at
> > the cloudera data summit last week, and I am curious as to others
> > thoughts around hard drive ratios. Various sources including Cloudera