Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Spindle per Cores


Copy link to this message
-
Re: Spindle per Cores
Does hypertheading affect this ratio?
On Oct 12, 2012 9:36 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:

> First, the obvious caveat... YMMV
>
> Having said that.
>
> The key here is to take a look across the various jobs that you will run.
> Some may be more CPU intensive, others more I/O intensive.
>
> If you monitor these jobs via Ganglia, when you have too few spindles you
> should see the wait cpu rise on the machines in the cluster.  That is to
> say that you are putting an extra load on the systems because you're
> waiting for the disks to catch up.
>
> If you increase the ratio of disks to CPU, you should see that load drop
> as you are not wasting CPU cycles.
>
> Note that its not just the number of spindles, but also the bus and the
> controller cards that can also affect the throughput of disk I/O.
>
> Now just IMHO, there was a discussion on some of the CPU recommendations.
> To a point, it doesn't matter that much. You want to maximize the bang for
> the buck you can get w your hardware purchase.
>
> Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core,
> and you're wasting the cpu that you bought.
>
> Going higher than a ratio of 1, like 1.5, and you may be buying too many
> spindles and not see a performance gain that offsets your cost.
>
> Search for a happy medium and don't sweat the maximum performance that you
> may get.
>
> HTH
>
> On Oct 12, 2012, at 4:19 PM, Jeffrey Buell <[EMAIL PROTECTED]> wrote:
>
> > I've done some experiments along these lines.  I'm using
> high-performance 15K RPM SAS drives instead of the more usual SATA drives,
> which should reduce the number of drives I need.  I have dual 4-core
> processors at 3.6 GHz.  These are more powerful than the average 4-core
> processor, which should increase the number of drives I need.  Assuming
> these 2 effects cancel, then my results should also apply to machines with
> SATA drives and average processors.  Using 8 drives (1-1) gets good
> performance for teragen and terasort.  Going to 12 drives (1.5 per core)
> increases terasort performance by 15%.  That might not seem like much
> compared to increasing the number of drives by 50%, but a better comparison
> is that 4 extra drives increased the cost of each machine by only about
> 12%, so the extra drives are (barely) worth it. If you're more time
> sensitive than cost sensitive, they they're definitely worth it.  The extra
> drives did not help teragen, apparently because both CPU and the internal
> storage controller were close to saturation. So, of course everything
> depends on the app.  You're shooting for saturated CPUs and disk bandwidth.
>  Check that the CPU is not saturated (after checking Hadoop tuning and
> optimizing the number of tasks). Check that you have enough memory for more
> tasks with room leftover for a large buffer cache.  Use 10 GbE networking
> or make sure the network has enough headroom.  Check the storage controller
> can handle more bandwidth.  If all are true (that is, no other
> bottlenecks), consider adding more drives.
> >
> > Jeff
> >
> >> -----Original Message-----
> >> From: Hank Cohen [mailto:[EMAIL PROTECTED]]
> >> Sent: Friday, October 12, 2012 1:46 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: RE: Spindle per Cores
> >>
> >> What empirical evidence is there for this rule of thumb?
> >> In other words, what tests or metrics would indicate an optimal
> >> spindle/core ratio and how dependent is this on the nature of the data
> >> and of the map/reduce computation?
> >>
> >> My understanding is that there are lots of clusters with more spindles
> >> than cores.  Specifically, typical 2U servers can hold 12 3.5" disk
> >> drives.  So lots of Hadoop clusters have dual 4 core processors and 12
> >> spindles.  Would it be better to have 6 core processors if you are
> >> loading up the boxes with 12 disks?  And most importantly, how would
> >> one know that the mix was optimal?
> >>
> >> Hank Cohen
> >> Altior Inc.
> >>
> >> -----Original Message-----