Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: What happens when you have fewer input files than mapper slots?


Copy link to this message
-
Re: What happens when you have fewer input files than mapper slots?
Which version of hadoop are you using ? MRV1 or MRV2 (yarn) ??

For MRv2 (yarn): you can pretty much achieve this using:

yarn.nodemanager.resource.memory-mb (system wide setting)
and
mapreduce.map.memory.mb  (job level setting)

e.g. if yarn.nodemanager.resource.memory-mb=100
and mapreduce.map.memory.mb= 40
a maximum of two mapper can run on a node at any time.

For MRv1, The equivalent way will be to control mapper slots on each
machine:
mapred.tasktracker.map.tasks.maximum,  of course this does not give you
'per job' control. on mappers.

In addition in both cases, you can use a scheduler with 'pools / queues'
capability in addition to restrict the overall use of grid resource. Do
read fair scheduler and capacity scheduler documentation...
-Rahul
On Tue, Mar 19, 2013 at 1:55 PM, jeremy p <[EMAIL PROTECTED]>wrote:

> Short version : let's say you have 20 nodes, and each node has 10 mapper
> slots.  You start a job with 20 very small input files.  How is the work
> distributed to the cluster?  Will it be even, with each node spawning one
> mapper task?  Is there any way of predicting or controlling how the work
> will be distributed?
>
> Long version : My cluster is currently used for two different jobs.  The
> cluster is currently optimized for Job A, so each node has a maximum of 18
> mapper slots.  However, I also need to run Job B.  Job B is VERY
> cpu-intensive, so we really only want one mapper to run on a node at any
> given time.  I've done a bunch of research, and it doesn't seem like Hadoop
> gives you any way to set the maximum number of mappers per node on a
> per-job basis.  I'm at my wit's end here, and considering some rather
> egregious workarounds.  If you can think of anything that can help me, I'd
> very much appreciate it.
>
> Thanks!
>
> --Jeremy
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB