Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> More cores Vs More Nodes ?

Copy link to this message
RE: More cores Vs More Nodes ?


Again, I think you need to really have some real world experience before you make generalizations like that.

Sorry, but at a client, we put 6 different groups' applications in production. Without going in to detail the jobs in production were orthogonal to one another. The point is that were we to build our cluster optimized to one job we would have been screwed. Oh wait, I forgot that you worked for IBM and they would love to sell you more hardware and consulting to improve the situation... (I kee-id, I kee-id)

Now Seriously,
The point of this discussion is that you really, really don't want to build the cluster optimized for a single job.
The only time you want to do that is if you have a job or set of jobs that you plan on running every day 24x7 and the job takes the entire cluster.
Yes, such jobs do exist. However they are highly irregular and definitely not the norm.

One of the other pain points is that developers have to get used to the cluster as a shared resource to be used between different teams. This helps to defer the costs including maintenance. So as a shared resource, development and production, you need to build out a box that handles everything equally.

Had you attended our session at Hadoop World, not only would you have learned this... (Don't tune the cluster to the application, but tune the application to the cluster) I would have also poked fun of you in person. ;-)

We also talked about avoiding the internet myths and 'truisms'.

Unless you've had your hands dirty and at customer's sites you're going to find the real world is a different place. ;-)
But hey! What do I know?
> Subject: RE: More cores Vs More Nodes ?
> Date: Wed, 14 Dec 2011 07:56:30 -0800
> Putting aside any smarmy responses for a moment - sorry that "job(s)"
> wasn't understood as equating to "purpose".
> If you are building a general purpose sandbox then I think we all agree on
> building a "balanced" general purpose cluster. But if you have production
> use cases in mind then you darn well better try to understand how the
> cluster will be used/stressed so you don't end up with a hardware spec
> that doesn't match how the cluster is actually used.
> If you can't profile a production use case as to how it will stress the
> cluster that is a huge warning sign as to project risk. If you are tearing
> down and re-purposing a cluster that was implemented to support a
> production use case then the planning failed.
> ------------------------------------------------
> Tom Deutsch
> Program Director
> Information Management
> Big Data Technologies
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420