Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - question of how to take full advantage of cluster resources

Guang Yang 2012-12-14, 23:03
Jeffrey Buell 2012-12-14, 23:29
Copy link to this message
Re: question of how to take full advantage of cluster resources
Harsh J 2012-12-14, 23:12
Please add in your RAM details as well, as that matters for
concurrently spawned JVMs.

On Sat, Dec 15, 2012 at 4:33 AM, Guang Yang <[EMAIL PROTECTED]> wrote:
> Hi,
> We have a beefy Hadoop cluster with 12 worker nodes and each one with 32
> cores. We have been running Map/reduce jobs on this cluster and we noticed
> that if we configure the Map/Reduce capacity in the cluster to be less than
> the available processors in the cluster (32 x 12 = 384), say 216 map slots
> and 144 reduce slots (360 total), the jobs run okay. But if we configure the
> total Map/Reduce capacity to be more than 384, we observe that sometimes job
> runs unusual long and the symptom is that certain tasks (usually map tasks)
> are stuck in "initializing" stage for a long time on certain nodes, before
> get processed. The nodes exhibiting this behavior are random and not tied to
> specific boxes. Isn't the general rule of thumb of configuring M/R capacity
> to be twice the number of processors in the cluster? What do people usually
> do to try to maximize the usage of the cluster resources in term of cluster
> capacity configuration? I'd appreciate any responses.
> Thanks,
> Guang Yang

Harsh J