Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Why they recommend this (CPU) ?


Copy link to this message
-
Re: Why they recommend this (CPU) ?
Without a doubt, there are many CPU intensive workloads where the amount of
CPU cycles consumed to process some amount of data is many times higher
than what would be considered relatively normal.  But at the same time,
there are many memory intensive workloads and IO bound workloads that are
common as well.  I've worked with companies who have been doing all 3 on a
single cluster, which is another point to be aware of.

Unless you are building a single application, single purpose cluster,
you'll probably have a mix of jobs with a mix of resource profiles.  So
designing a cluster so your CPU heavy job runs faster may mean you skimped
on spindles or disk speed, and when you want to run your new application
and do your mixed workload, you end up having a bottleneck on the IO side.

So keep in mind, not just the profile of a specific workload, but of the
work you want to support on the cluster in general.

On Thu, Oct 11, 2012 at 12:03 PM, Russell Jurney
<[EMAIL PROTECTED]>wrote:

> My own clusters are too temporary and virtual for me to notice. I haven't
> thought of clock speed as having mattered in a long time, so I'm curious
> what kind of use cases might benefit from faster cores. Is there a category
> in some way where this sweet spot for faster cores occurs?
>
> Russell Jurney http://datasyndrome.com
>
> On Oct 11, 2012, at 11:39 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> You should measure your workload.  Your experience will vary dramatically
> with different computations.
>
> On Thu, Oct 11, 2012 at 10:56 AM, Russell Jurney <[EMAIL PROTECTED]
> > wrote:
>
>> Anyone got data on this? This is interesting, and somewhat
>> counter-intuitive.
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Oct 11, 2012, at 10:47 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>>
>> > Presumably, if you have a reasonable number of cores - speeding the
>> cores up will be better than forking a task into smaller and smaller chunks
>> - because at some point the overhead of multiple processes would be a
>> bottleneck - maybe due to streaming reads and writes?  I'm sure each and
>> every problem has a different sweet spot.
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB