|
|
+
Shaojun Zhao 2013-01-18, 20:05
+
Jeffrey Buell 2013-01-18, 20:23
-
Re: config for high memory jobs does not work, please help.Arun C Murthy 2013-01-18, 22:54
Not sure about EMR, but if you install your own cluster on EC2 you can use the configs mentioned here:
>> http://hadoop.apache.org/docs/stable/capacity_scheduler.html Arun On Jan 18, 2013, at 2:50 PM, Shaojun Zhao wrote: > I am using Amazon EC2/EMR. > jps give this > 16600 JobTracker > 2732 RunJar > 2504 StatePusher > 31902 instance-controller.jar > 23553 Jps > 22444 RunJar > 2077 NameNode > > I am not sure how I can impose capacityscheduler on ec2/emr machines. > -Shaojun > > On Fri, Jan 18, 2013 at 1:18 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> Take a look at the CapacityScheduler and 'High RAM' jobs where-by you can run M map slots per node and request, per-job, that you want N (where N = max(1, N, M)). >> >> Some more info: >> http://hadoop.apache.org/docs/stable/capacity_scheduler.html#Resource+based+scheduling >> http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/ >> >> hth, >> Arun >> >> On Jan 18, 2013, at 12:05 PM, Shaojun Zhao wrote: >> >>> Dear all, >>> >>> I know it is best to use small amount of mem in mapper and reduce. >>> However, sometimes it is hard to do so. For example, in machine >>> learning algorithms, it is common to load the model into mem in the >>> mapper step. When the model is big, I have to allocate a lot of mem >>> for the mapper. >>> >>> Here is my question: how can I config hadoop so that it does not fork >>> too many mappers and run out of physical memory? >>> >>> My machines have 24G, and I have 100 of them. Each time, hadoop will >>> fork 6 mappers on each machine, no matter what config I used. I really >>> want to reduce it to what ever number I want, for example, just 1 >>> mapper per machine. >>> >>> Here are the config I tried. (I use streaming, and I pass the config >>> in the command line) >>> >>> -Dmapred.child.java.opts=-Xmx8000m <-- did not bring down the number of mappers >>> >>> -Dmapred.cluster.map.memory.mb=32000 <-- did not bring down the number >>> of mappers >>> >>> Am I missing something here? >>> I use Hadoop 0.20.205 >>> >>> Thanks a lot in advance! >>> -Shaojun >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ |