Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Spindle per Cores

Copy link to this message
Re: Spindle per Cores
Thanks Michael.
On Oct 12, 2012 9:59 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:

> I think what we are seeing is the ratio based on physical Xeon cores.
> So hyper threading wouldn't make any change to  the actual ratio.
> (1 disk per physical core, would be 1 disk per 2 virtual cores.)
> Again YMMV and of course thanks to this guy Moore who decided to write
> some weird laws... the ratio could change over time as the CPUs become more
> efficient and faster.
> On Oct 12, 2012, at 9:52 PM, ranjith raghunath <
> Does hypertheading affect this ratio?
> On Oct 12, 2012 9:36 PM, "Michael Segel" <[EMAIL PROTECTED]>
> wrote:
>> First, the obvious caveat... YMMV
>> Having said that.
>> The key here is to take a look across the various jobs that you will run.
>> Some may be more CPU intensive, others more I/O intensive.
>> If you monitor these jobs via Ganglia, when you have too few spindles you
>> should see the wait cpu rise on the machines in the cluster.  That is to
>> say that you are putting an extra load on the systems because you're
>> waiting for the disks to catch up.
>> If you increase the ratio of disks to CPU, you should see that load drop
>> as you are not wasting CPU cycles.
>> Note that its not just the number of spindles, but also the bus and the
>> controller cards that can also affect the throughput of disk I/O.
>> Now just IMHO, there was a discussion on some of the CPU recommendations.
>> To a point, it doesn't matter that much. You want to maximize the bang for
>> the buck you can get w your hardware purchase.
>> Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core,
>> and you're wasting the cpu that you bought.
>> Going higher than a ratio of 1, like 1.5, and you may be buying too many
>> spindles and not see a performance gain that offsets your cost.
>> Search for a happy medium and don't sweat the maximum performance that
>> you may get.
>> HTH
>> On Oct 12, 2012, at 4:19 PM, Jeffrey Buell <[EMAIL PROTECTED]> wrote:
>> > I've done some experiments along these lines.  I'm using
>> high-performance 15K RPM SAS drives instead of the more usual SATA drives,
>> which should reduce the number of drives I need.  I have dual 4-core
>> processors at 3.6 GHz.  These are more powerful than the average 4-core
>> processor, which should increase the number of drives I need.  Assuming
>> these 2 effects cancel, then my results should also apply to machines with
>> SATA drives and average processors.  Using 8 drives (1-1) gets good
>> performance for teragen and terasort.  Going to 12 drives (1.5 per core)
>> increases terasort performance by 15%.  That might not seem like much
>> compared to increasing the number of drives by 50%, but a better comparison
>> is that 4 extra drives increased the cost of each machine by only about
>> 12%, so the extra drives are (barely) worth it. If you're more time
>> sensitive than cost sensitive, they they're definitely worth it.  The extra
>> drives did not help teragen, apparently because both CPU and the internal
>> storage controller were close to saturation. So, of course everything
>> depends on the app.  You're shooting for saturated CPUs and disk bandwidth.
>>  Check that the CPU is not saturated (after checking Hadoop tuning and
>> optimizing the number of tasks). Check that you have enough memory for more
>> tasks with room leftover for a large buffer cache.  Use 10 GbE networking
>> or make sure the network has enough headroom.  Check the storage controller
>> can handle more bandwidth.  If all are true (that is, no other
>> bottlenecks), consider adding more drives.
>> >
>> > Jeff
>> >
>> >> -----Original Message-----
>> >> From: Hank Cohen [mailto:[EMAIL PROTECTED]]
>> >> Sent: Friday, October 12, 2012 1:46 PM
>> >> Subject: RE: Spindle per Cores
>> >>
>> >> What empirical evidence is there for this rule of thumb?
>> >> In other words, what tests or metrics would indicate an optimal