Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - config for high memory jobs does not work, please help.


+
Shaojun Zhao 2013-01-18, 20:05
+
Jeffrey Buell 2013-01-18, 20:23
Copy link to this message
-
Re: config for high memory jobs does not work, please help.
Arun C Murthy 2013-01-18, 22:54
Not sure about EMR, but if you install your own cluster on EC2 you can use the configs mentioned here:

>> http://hadoop.apache.org/docs/stable/capacity_scheduler.html

Arun

On Jan 18, 2013, at 2:50 PM, Shaojun Zhao wrote:

> I am using Amazon EC2/EMR.
> jps give this
> 16600 JobTracker
> 2732 RunJar
> 2504 StatePusher
> 31902 instance-controller.jar
> 23553 Jps
> 22444 RunJar
> 2077 NameNode
>
> I am not sure how I can impose capacityscheduler on ec2/emr machines.
> -Shaojun
>
> On Fri, Jan 18, 2013 at 1:18 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>> Take a look at the CapacityScheduler and 'High RAM' jobs where-by you can run M map slots per node and request, per-job, that you want N (where N = max(1, N, M)).
>>
>> Some more info:
>> http://hadoop.apache.org/docs/stable/capacity_scheduler.html#Resource+based+scheduling
>> http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/
>>
>> hth,
>> Arun
>>
>> On Jan 18, 2013, at 12:05 PM, Shaojun Zhao wrote:
>>
>>> Dear all,
>>>
>>> I know it is best to use small amount of mem in mapper and reduce.
>>> However, sometimes it is hard to do so. For example, in machine
>>> learning algorithms, it is common to load the model into mem in the
>>> mapper step. When the model is big, I have to allocate a lot of mem
>>> for the mapper.
>>>
>>> Here is my question: how can I config hadoop so that it does not fork
>>> too many mappers and run out of physical memory?
>>>
>>> My machines have 24G, and I have 100 of them. Each time, hadoop will
>>> fork 6 mappers on each machine, no matter what config I used. I really
>>> want to reduce it to what ever number I want, for example, just 1
>>> mapper per machine.
>>>
>>> Here are the config I tried. (I use streaming, and I pass the config
>>> in the command line)
>>>
>>> -Dmapred.child.java.opts=-Xmx8000m  <-- did not bring down the number of mappers
>>>
>>> -Dmapred.cluster.map.memory.mb=32000 <-- did not bring down the number
>>> of mappers
>>>
>>> Am I missing something here?
>>> I use Hadoop 0.20.205
>>>
>>> Thanks a lot in advance!
>>> -Shaojun
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/