Short version : let's say you have 20 nodes, and each node has 10 mapper
slots. You start a job with 20 very small input files. How is the work
distributed to the cluster? Will it be even, with each node spawning one
mapper task? Is there any way of predicting or controlling how the work
will be distributed?
Long version : My cluster is currently used for two different jobs. The
cluster is currently optimized for Job A, so each node has a maximum of 18
mapper slots. However, I also need to run Job B. Job B is VERY
cpu-intensive, so we really only want one mapper to run on a node at any
given time. I've done a bunch of research, and it doesn't seem like Hadoop
gives you any way to set the maximum number of mappers per node on a
per-job basis. I'm at my wit's end here, and considering some rather
egregious workarounds. If you can think of anything that can help me, I'd
very much appreciate it.