|
|
+
Arun C Murthy 2013-01-18, 21:18
-
Re: config for high memory jobs does not work, please help.Shaojun Zhao 2013-01-18, 22:50
I am using Amazon EC2/EMR.
jps give this 16600 JobTracker 2732 RunJar 2504 StatePusher 31902 instance-controller.jar 23553 Jps 22444 RunJar 2077 NameNode I am not sure how I can impose capacityscheduler on ec2/emr machines. -Shaojun On Fri, Jan 18, 2013 at 1:18 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Take a look at the CapacityScheduler and 'High RAM' jobs where-by you can run M map slots per node and request, per-job, that you want N (where N = max(1, N, M)). > > Some more info: > http://hadoop.apache.org/docs/stable/capacity_scheduler.html#Resource+based+scheduling > http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/ > > hth, > Arun > > On Jan 18, 2013, at 12:05 PM, Shaojun Zhao wrote: > >> Dear all, >> >> I know it is best to use small amount of mem in mapper and reduce. >> However, sometimes it is hard to do so. For example, in machine >> learning algorithms, it is common to load the model into mem in the >> mapper step. When the model is big, I have to allocate a lot of mem >> for the mapper. >> >> Here is my question: how can I config hadoop so that it does not fork >> too many mappers and run out of physical memory? >> >> My machines have 24G, and I have 100 of them. Each time, hadoop will >> fork 6 mappers on each machine, no matter what config I used. I really >> want to reduce it to what ever number I want, for example, just 1 >> mapper per machine. >> >> Here are the config I tried. (I use streaming, and I pass the config >> in the command line) >> >> -Dmapred.child.java.opts=-Xmx8000m <-- did not bring down the number of mappers >> >> -Dmapred.cluster.map.memory.mb=32000 <-- did not bring down the number >> of mappers >> >> Am I missing something here? >> I use Hadoop 0.20.205 >> >> Thanks a lot in advance! >> -Shaojun > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > |