Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> question of how to take full advantage of cluster resources


Copy link to this message
-
Re: question of how to take full advantage of cluster resources
Please add in your RAM details as well, as that matters for
concurrently spawned JVMs.

On Sat, Dec 15, 2012 at 4:33 AM, Guang Yang <[EMAIL PROTECTED]> wrote:
> Hi,
>
> We have a beefy Hadoop cluster with 12 worker nodes and each one with 32
> cores. We have been running Map/reduce jobs on this cluster and we noticed
> that if we configure the Map/Reduce capacity in the cluster to be less than
> the available processors in the cluster (32 x 12 = 384), say 216 map slots
> and 144 reduce slots (360 total), the jobs run okay. But if we configure the
> total Map/Reduce capacity to be more than 384, we observe that sometimes job
> runs unusual long and the symptom is that certain tasks (usually map tasks)
> are stuck in "initializing" stage for a long time on certain nodes, before
> get processed. The nodes exhibiting this behavior are random and not tied to
> specific boxes. Isn't the general rule of thumb of configuring M/R capacity
> to be twice the number of processors in the cluster? What do people usually
> do to try to maximize the usage of the cluster resources in term of cluster
> capacity configuration? I'd appreciate any responses.
>
> Thanks,
> Guang Yang

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB