On Oct 12, 2012 9:59 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
> I think what we are seeing is the ratio based on physical Xeon cores.
> So hyper threading wouldn't make any change to the actual ratio.
> (1 disk per physical core, would be 1 disk per 2 virtual cores.)
> Again YMMV and of course thanks to this guy Moore who decided to write
> some weird laws... the ratio could change over time as the CPUs become more
> efficient and faster.
> On Oct 12, 2012, at 9:52 PM, ranjith raghunath <
> [EMAIL PROTECTED]> wrote:
> Does hypertheading affect this ratio?
> On Oct 12, 2012 9:36 PM, "Michael Segel" <[EMAIL PROTECTED]>
>> First, the obvious caveat... YMMV
>> Having said that.
>> The key here is to take a look across the various jobs that you will run.
>> Some may be more CPU intensive, others more I/O intensive.
>> If you monitor these jobs via Ganglia, when you have too few spindles you
>> should see the wait cpu rise on the machines in the cluster. That is to
>> say that you are putting an extra load on the systems because you're
>> waiting for the disks to catch up.
>> If you increase the ratio of disks to CPU, you should see that load drop
>> as you are not wasting CPU cycles.
>> Note that its not just the number of spindles, but also the bus and the
>> controller cards that can also affect the throughput of disk I/O.
>> Now just IMHO, there was a discussion on some of the CPU recommendations.
>> To a point, it doesn't matter that much. You want to maximize the bang for
>> the buck you can get w your hardware purchase.
>> Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core,
>> and you're wasting the cpu that you bought.
>> Going higher than a ratio of 1, like 1.5, and you may be buying too many
>> spindles and not see a performance gain that offsets your cost.
>> Search for a happy medium and don't sweat the maximum performance that
>> you may get.
>> On Oct 12, 2012, at 4:19 PM, Jeffrey Buell <[EMAIL PROTECTED]> wrote:
>> > I've done some experiments along these lines. I'm using
>> high-performance 15K RPM SAS drives instead of the more usual SATA drives,
>> which should reduce the number of drives I need. I have dual 4-core
>> processors at 3.6 GHz. These are more powerful than the average 4-core
>> processor, which should increase the number of drives I need. Assuming
>> these 2 effects cancel, then my results should also apply to machines with
>> SATA drives and average processors. Using 8 drives (1-1) gets good
>> performance for teragen and terasort. Going to 12 drives (1.5 per core)
>> increases terasort performance by 15%. That might not seem like much
>> compared to increasing the number of drives by 50%, but a better comparison
>> is that 4 extra drives increased the cost of each machine by only about
>> 12%, so the extra drives are (barely) worth it. If you're more time
>> sensitive than cost sensitive, they they're definitely worth it. The extra
>> drives did not help teragen, apparently because both CPU and the internal
>> storage controller were close to saturation. So, of course everything
>> depends on the app. You're shooting for saturated CPUs and disk bandwidth.
>> Check that the CPU is not saturated (after checking Hadoop tuning and
>> optimizing the number of tasks). Check that you have enough memory for more
>> tasks with room leftover for a large buffer cache. Use 10 GbE networking
>> or make sure the network has enough headroom. Check the storage controller
>> can handle more bandwidth. If all are true (that is, no other
>> bottlenecks), consider adding more drives.
>> > Jeff
>> >> -----Original Message-----
>> >> From: Hank Cohen [mailto:[EMAIL PROTECTED]]
>> >> Sent: Friday, October 12, 2012 1:46 PM
>> >> To: [EMAIL PROTECTED]
>> >> Subject: RE: Spindle per Cores
>> >> What empirical evidence is there for this rule of thumb?
>> >> In other words, what tests or metrics would indicate an optimal