Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - More cores Vs More Nodes ?


Copy link to this message
-
RE: More cores Vs More Nodes ?
Michael Segel 2011-12-14, 17:05


Brian,

I think you missed my point.

The moment you go and design a cluster for a specific job, you end up getting fscked because there's another group who wants to use the shared resource for their job which could be orthogonal to the original purpose. It happens everyday.

This is why you have to ask if the cluster is being built for a specific purpose. Meaning answering the question 'Which of the following best describes your cluster:
a) PoC
b) Development
c) Pre-prod
d) Production
e) Secondary/Backup
"

Note that sizing the cluster is a different matter.
Meaning if you know you need a PB of storage, you're going to design the cluster differently because once you get to a certain size, you have to recognize that your clusters are going to have lots of disk, require 10GBe just for the storage. Number of cores would be less of an issue, however again look at pricing. 2 socket 8 core Xeon MBs are currently at an optimal price point.

And again this goes back to the point I was trying to make.
You need to look beyond the number of cores as a determining factor.
You go too small, you're going to take a hit because of the price/performance curve.
(Remember that you have to consider Machine Room real estate. 100 2 core boxes take up much more space than 25 8 core boxes)

If you go to the other extreme... 64 core giant SMP box $$$$$ for $$$ (less money) build out an 8 node cluster.

Beyond that, you really, really don't want to build a custom cluster for a specific job unless you know that you're going to be running that specific job or set of jobs (24x7X365) [And yes, I came across such a use case...]

HTH

-Mike
> From: [EMAIL PROTECTED]
> Subject: Re: More cores Vs More Nodes ?
> Date: Wed, 14 Dec 2011 07:41:25 -0600
> To: [EMAIL PROTECTED]
>
> Actually, there are varying degrees here.
>
> If you have a successful project, you will find other groups at your door wanting to use the cluster too.  Their jobs might be different from the original use case.
>
> However, if you don't understand the original use case ("CPU heavy or storage heavy?" is a great beginning question), your original project won't be successful.  Then there will be no follow-up users because you failed.
>
> So, you want to have a reasonably general-purpose cluster, but make sure it matches well with the type of jobs.  As an example, we had one group who required an estimated CPU-millenia per byte of data… they needed a "general purpose cluster" for a certain value of "general purpose".
>
> Brian
>
> On Dec 14, 2011, at 7:29 AM, Michael Segel wrote:
>
> >
> > Aw Tommy,
> > Actually no. You really don't want to do this.
> >
> > If you actually ran a cluster and worked in the real world, you would find that if you purposely build a cluster for one job, there will be a mandate that some other group needs to use the cluster and that their job has different performance issues and your cluster is now suboptimal for their jobs...
> >
> > Perhaps you meant that you needed to think about the purpose of the cluster? That is do you want to minimize the nodes but maximize the disk space per node and use the cluster as your backup cluster? (Assuming that you are considering your DR and BCP in your design.)
> >
> > The problem with your answer, is that a job has a specific meaning within the Hadoop world.  You should have asked what is the purpose of the cluster.
> >
> > I agree w Brad, that it depends ...
> >
> > But the factors which will impact your cluster design are more along the lines of the purpose of the cluster and then the budget along with your IT constraints.
> >
> > IMHO its better to avoid building purpose built clusters. You end up not being able to easily recycle the hardware in to new clusters easily.
> >
> > But hey what do I know? ;-)
> >
> >> To: [EMAIL PROTECTED]
> >> Subject: RE: More cores Vs More Nodes ?
> >> From: [EMAIL PROTECTED]
> >> Date: Tue, 13 Dec 2011 09:46:49 -0800
> >>
> >> It also helps to know the profile of your job in how you spec the