Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - More cores Vs More Nodes ?

Copy link to this message
RE: More cores Vs More Nodes ?
Michael Segel 2011-12-14, 13:29

Aw Tommy,
Actually no. You really don't want to do this.

If you actually ran a cluster and worked in the real world, you would find that if you purposely build a cluster for one job, there will be a mandate that some other group needs to use the cluster and that their job has different performance issues and your cluster is now suboptimal for their jobs...

Perhaps you meant that you needed to think about the purpose of the cluster? That is do you want to minimize the nodes but maximize the disk space per node and use the cluster as your backup cluster? (Assuming that you are considering your DR and BCP in your design.)

The problem with your answer, is that a job has a specific meaning within the Hadoop world.  You should have asked what is the purpose of the cluster.

I agree w Brad, that it depends ...

But the factors which will impact your cluster design are more along the lines of the purpose of the cluster and then the budget along with your IT constraints.

IMHO its better to avoid building purpose built clusters. You end up not being able to easily recycle the hardware in to new clusters easily.

But hey what do I know? ;-)

> Subject: RE: More cores Vs More Nodes ?
> Date: Tue, 13 Dec 2011 09:46:49 -0800
> It also helps to know the profile of your job in how you spec the
> machines. So in addition to Brad's response you should consider if you
> think your jobs will be more storage or compute oriented.
> ------------------------------------------------
> Tom Deutsch
> Program Director
> Information Management
> Big Data Technologies
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> Brad Sarsfield <[EMAIL PROTECTED]>
> 12/13/2011 09:41 AM
> Please respond to
> To
> cc
> Subject
> RE: More cores Vs More Nodes ?
> Praveenesh,
> Your question is not naïve; in fact, optimal hardware design can
> ultimately be a very difficult question to answer on what would be
> "better". If you made me pick one without much information I'd go for more
> machines.  But...
> It all depends; and there is no right answer.... :)
> More machines
>                  +May run your workload faster
>                  +Will give you a higher degree of reliability protection
> from node / hardware / hard drive failure.
>                  +More aggregate IO capabilities
>                  - capex / opex may be higher than allocating more cores
> More cores
>                  +May run your workload faster
>                  +More cores may allow for more tasks to run on the same
> machine
>                  +More cores/tasks may reduce network contention and
> increase increasing task to task data flow performance.
> Notice "May run your workload faster" is in both; as it can be very
> workload dependant.
> My Experience:
> I did a recent experiment and found that given the same number of cores
> (64) with the exact same network / machine configuration;
>                  A: I had 8 machines with 8 cores
>                  B: I had 28 machines with 2 cores (and 1x8 core head
> node)
> B was able to outperform A by 2x using teragen and terasort. These
> machines were running in a virtualized environment; where some of the IO
> capabilities behind the scenes were being regulated to 400Mbps per node
> when running in the 2 core configuration vs 1Gbps on the 8 core.  So I
> would expect the non-throttled scenario to work even better.
> ~Brad
> -----Original Message-----
> From: praveenesh kumar [mailto:[EMAIL PROTECTED]]
> Sent: Monday, December 12, 2011 8:51 PM
> Subject: More cores Vs More Nodes ?
> Hey Guys,
> So I have a very naive question in my mind regarding Hadoop cluster nodes
> ?
> more cores or more nodes - Shall I spend money on going from 2-4 core