If you're talking in per-machine-slot terms, it is possible to do if you use the Capacity Scheduler, and set a memory requirement worthy of 4 slots for your job. This way CS will reserve 4 slots for running a single task (on a single task tracker).
If you are instead asking for a way to not run tasks one by one, but rather run them all in parallel (across machine) but otherwise not run at all, thats not directly possible to do via the MR framework, but you may hang your task with your own conditions and only invoke all to begin if all have entered running modes. Using ZK should let you do this. Alternatively, consider the YARN framework that gives you more granular control on flow execution of tasks if you need that.
On Tue, Jul 3, 2012 at 5:48 AM, Yang <[EMAIL PROTECTED]> wrote: > let's say my job can run on 4 mapper slots, but if there is only 1 slot > available, > I don't want them to run one by one, and have to wait till the time that at > least 4 slots are available. > > is it possible to force hadoop to do this? > > thanks! > yang
-- Harsh J
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext