Does hypertheading affect this ratio?
On Oct 12, 2012 9:36 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
> First, the obvious caveat... YMMV
> Having said that.
> The key here is to take a look across the various jobs that you will run.
> Some may be more CPU intensive, others more I/O intensive.
> If you monitor these jobs via Ganglia, when you have too few spindles you
> should see the wait cpu rise on the machines in the cluster. That is to
> say that you are putting an extra load on the systems because you're
> waiting for the disks to catch up.
> If you increase the ratio of disks to CPU, you should see that load drop
> as you are not wasting CPU cycles.
> Note that its not just the number of spindles, but also the bus and the
> controller cards that can also affect the throughput of disk I/O.
> Now just IMHO, there was a discussion on some of the CPU recommendations.
> To a point, it doesn't matter that much. You want to maximize the bang for
> the buck you can get w your hardware purchase.
> Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core,
> and you're wasting the cpu that you bought.
> Going higher than a ratio of 1, like 1.5, and you may be buying too many
> spindles and not see a performance gain that offsets your cost.
> Search for a happy medium and don't sweat the maximum performance that you
> may get.
> On Oct 12, 2012, at 4:19 PM, Jeffrey Buell <[EMAIL PROTECTED]> wrote:
> > I've done some experiments along these lines. I'm using
> high-performance 15K RPM SAS drives instead of the more usual SATA drives,
> which should reduce the number of drives I need. I have dual 4-core
> processors at 3.6 GHz. These are more powerful than the average 4-core
> processor, which should increase the number of drives I need. Assuming
> these 2 effects cancel, then my results should also apply to machines with
> SATA drives and average processors. Using 8 drives (1-1) gets good
> performance for teragen and terasort. Going to 12 drives (1.5 per core)
> increases terasort performance by 15%. That might not seem like much
> compared to increasing the number of drives by 50%, but a better comparison
> is that 4 extra drives increased the cost of each machine by only about
> 12%, so the extra drives are (barely) worth it. If you're more time
> sensitive than cost sensitive, they they're definitely worth it. The extra
> drives did not help teragen, apparently because both CPU and the internal
> storage controller were close to saturation. So, of course everything
> depends on the app. You're shooting for saturated CPUs and disk bandwidth.
> Check that the CPU is not saturated (after checking Hadoop tuning and
> optimizing the number of tasks). Check that you have enough memory for more
> tasks with room leftover for a large buffer cache. Use 10 GbE networking
> or make sure the network has enough headroom. Check the storage controller
> can handle more bandwidth. If all are true (that is, no other
> bottlenecks), consider adding more drives.
> > Jeff
> >> -----Original Message-----
> >> From: Hank Cohen [mailto:[EMAIL PROTECTED]]
> >> Sent: Friday, October 12, 2012 1:46 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: RE: Spindle per Cores
> >> What empirical evidence is there for this rule of thumb?
> >> In other words, what tests or metrics would indicate an optimal
> >> spindle/core ratio and how dependent is this on the nature of the data
> >> and of the map/reduce computation?
> >> My understanding is that there are lots of clusters with more spindles
> >> than cores. Specifically, typical 2U servers can hold 12 3.5" disk
> >> drives. So lots of Hadoop clusters have dual 4 core processors and 12
> >> spindles. Would it be better to have 6 core processors if you are
> >> loading up the boxes with 12 disks? And most importantly, how would
> >> one know that the mix was optimal?
> >> Hank Cohen
> >> Altior Inc.
> >> -----Original Message-----