Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sanity check re: value of 10GbE NICs for Hadoop?


Copy link to this message
-
Re: Sanity check re: value of 10GbE NICs for Hadoop?
Price the cost of 1GbE->10GbE vs. more nodes, using data from monitoring
your cluster during peak load.  It should be clear which is a better value.

Russ

On Tue, Jun 28, 2011 at 4:05 PM, Mathias Herberts <
[EMAIL PROTECTED]> wrote:

> On Wed, Jun 29, 2011 at 01:02, Matei Zaharia <[EMAIL PROTECTED]>
> wrote:
> > Ideally, to evaluate whether you want to go for 10GbE NICs, you would
> profile your target Hadoop workload and see whether it's
> communication-bound. Hadoop jobs can definitely be communication-bound if
> you shuffle a lot of data between map and reduce, but I've also seen a lot
> of clusters that are CPU-bound (due to decompression, running python, or
> just running expensive user code) or disk-IO-bound. You might be surprised
> at what your bottleneck is.
>
> From my experience, jobs that shuffle lots of data are also very often
> slowed down by the sort phase, compressing mappers' output is a first
> step to improve performance. Given the cost of a 10GbE infrastructure
> with no oversubscription I'd monitor bandwith usage very closely prior
> to investing in that kind of network gear.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB