Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sanity check re: value of 10GbE NICs for Hadoop?


Copy link to this message
-
Re: Sanity check re: value of 10GbE NICs for Hadoop?
Price the cost of 1GbE->10GbE vs. more nodes, using data from monitoring
your cluster during peak load.  It should be clear which is a better value.

Russ

On Tue, Jun 28, 2011 at 4:05 PM, Mathias Herberts <
[EMAIL PROTECTED]> wrote:

> On Wed, Jun 29, 2011 at 01:02, Matei Zaharia <[EMAIL PROTECTED]>
> wrote:
> > Ideally, to evaluate whether you want to go for 10GbE NICs, you would
> profile your target Hadoop workload and see whether it's
> communication-bound. Hadoop jobs can definitely be communication-bound if
> you shuffle a lot of data between map and reduce, but I've also seen a lot
> of clusters that are CPU-bound (due to decompression, running python, or
> just running expensive user code) or disk-IO-bound. You might be surprised
> at what your bottleneck is.
>
> From my experience, jobs that shuffle lots of data are also very often
> slowed down by the sort phase, compressing mappers' output is a first
> step to improve performance. Given the cost of a 10GbE infrastructure
> with no oversubscription I'd monitor bandwith usage very closely prior
> to investing in that kind of network gear.
>