Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> More cores Vs More Nodes ?


+
praveenesh kumar 2011-12-13, 04:50
+
Brad Sarsfield 2011-12-13, 17:41
+
Prashant Kommireddi 2011-12-13, 17:46
+
Brad Sarsfield 2011-12-14, 01:15
+
He Chen 2011-12-14, 02:25
+
Michael Segel 2011-12-14, 13:15
+
Russell Jurney 2011-12-14, 21:27
+
Brad Sarsfield 2011-12-15, 05:50
+
Michel Segel 2011-12-17, 14:57
+
Tom Deutsch 2011-12-13, 17:46
+
bharath vissapragada 2011-12-13, 17:59
+
Michael Segel 2011-12-14, 13:29
+
Brian Bockelman 2011-12-14, 13:41
+
Michael Segel 2011-12-14, 17:05
+
Scott Carey 2011-12-14, 18:37
+
Tom Deutsch 2011-12-14, 15:56
+
Michael Segel 2011-12-14, 17:33
+
Tom Deutsch 2011-12-14, 19:40
+
Michael Segel 2011-12-15, 20:57
+
real great.. 2011-12-13, 17:45
Copy link to this message
-
Re: More cores Vs More Nodes ?
more nodes means more IO on read on mapper step
If you use combiners you might need to send only small amount of data over
network to reducers

Alexander
On Tue, Dec 13, 2011 at 12:45 PM, real great.. <[EMAIL PROTECTED]
> wrote:

> more cores might help in hadoop environments as there would be more data
> locality.
> your thoughts?
>
> On Tue, Dec 13, 2011 at 11:11 PM, Brad Sarsfield <[EMAIL PROTECTED]> wrote:
>
> > Praveenesh,
> >
> > Your question is not naïve; in fact, optimal hardware design can
> > ultimately be a very difficult question to answer on what would be
> > "better". If you made me pick one without much information I'd go for
> more
> > machines.  But...
> >
> > It all depends; and there is no right answer.... :)
> >
> > More machines
> >        +May run your workload faster
> >        +Will give you a higher degree of reliability protection from node
> > / hardware / hard drive failure.
> >        +More aggregate IO capabilities
> >        - capex / opex may be higher than allocating more cores
> > More cores
> >        +May run your workload faster
> >        +More cores may allow for more tasks to run on the same machine
> >        +More cores/tasks may reduce network contention and increase
> > increasing task to task data flow performance.
> >
> > Notice "May run your workload faster" is in both; as it can be very
> > workload dependant.
> >
> > My Experience:
> > I did a recent experiment and found that given the same number of cores
> > (64) with the exact same network / machine configuration;
> >        A: I had 8 machines with 8 cores
> >        B: I had 28 machines with 2 cores (and 1x8 core head node)
> >
> > B was able to outperform A by 2x using teragen and terasort. These
> > machines were running in a virtualized environment; where some of the IO
> > capabilities behind the scenes were being regulated to 400Mbps per node
> > when running in the 2 core configuration vs 1Gbps on the 8 core.  So I
> > would expect the non-throttled scenario to work even better.
> >
> > ~Brad
> >
> >
> > -----Original Message-----
> > From: praveenesh kumar [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, December 12, 2011 8:51 PM
> > To: [EMAIL PROTECTED]
> > Subject: More cores Vs More Nodes ?
> >
> > Hey Guys,
> >
> > So I have a very naive question in my mind regarding Hadoop cluster
> nodes ?
> >
> > more cores or more nodes - Shall I spend money on going from 2-4 core
> > machines, or spend money on buying more nodes less core eg. say 2
> machines
> > of 2 cores for example?
> >
> > Thanks,
> > Praveenesh
> >
> >
>
>
> --
> Regards,
> R.V.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB