Arun C Murthy 2013-01-18, 21:18
I am using Amazon EC2/EMR.
jps give this
I am not sure how I can impose capacityscheduler on ec2/emr machines.
On Fri, Jan 18, 2013 at 1:18 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Take a look at the CapacityScheduler and 'High RAM' jobs where-by you can run M map slots per node and request, per-job, that you want N (where N = max(1, N, M)).
> Some more info:
> On Jan 18, 2013, at 12:05 PM, Shaojun Zhao wrote:
>> Dear all,
>> I know it is best to use small amount of mem in mapper and reduce.
>> However, sometimes it is hard to do so. For example, in machine
>> learning algorithms, it is common to load the model into mem in the
>> mapper step. When the model is big, I have to allocate a lot of mem
>> for the mapper.
>> Here is my question: how can I config hadoop so that it does not fork
>> too many mappers and run out of physical memory?
>> My machines have 24G, and I have 100 of them. Each time, hadoop will
>> fork 6 mappers on each machine, no matter what config I used. I really
>> want to reduce it to what ever number I want, for example, just 1
>> mapper per machine.
>> Here are the config I tried. (I use streaming, and I pass the config
>> in the command line)
>> -Dmapred.child.java.opts=-Xmx8000m <-- did not bring down the number of mappers
>> -Dmapred.cluster.map.memory.mb=32000 <-- did not bring down the number
>> of mappers
>> Am I missing something here?
>> I use Hadoop 0.20.205
>> Thanks a lot in advance!
> Arun C. Murthy
> Hortonworks Inc.