Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - memory management of capacity scheduling


Copy link to this message
-
Re: memory management of capacity scheduling
Hemanth Yamijala 2010-06-26, 18:09
Shashank,

> Hi,
>
> Setup Info:
> I have 2 node hadoop (20.2) cluster on Linux boxes.
> HW info: 16 CPU (Hyperthreaded)
> RAM: 32 GB
>
> I am trying to configure capacity scheduling. I want to use memory
> management provided by capacity scheduler. But I am facing few issues.
> I have added hadoop-0.20.2-capacity-scheduler.jar in lib. Also added
> ‘mapred.jobtracker.taskScheduler’ in hadoop-site.xml

First things first - the memory management implementation in the
capacity scheduler has seen significant improvements in Hadoop 0.21.
Specifically, the implementation in Hadoop 0.20 could cause a high
degree of cluster under utilization that was fixed in MAPREDUCE-516
and subsequent JIRAs in Hadoop 0.21.

> I have added below in capacity-scheduler.xml file, but I get error:
>  <property>
>    <name>mapred.tasktracker.vmem.reserved</name>
>    <value>26624m</value>
>    <description>A number, in bytes, that represents an offset. The total
> VMEM
>        on the machine, minus this offset, is the VMEM node-limit for all
>        tasks, and their descendants, spawned by the TT.
>    </description>
>  </property>
>  <property>
>    <name>mapred.task.default.maxvmem</name>
>    <value>512k</value>
>    <description>A number, in bytes, that represents the default VMEM
>        task-limit associated with a task. Unless overridden by a job's
>        setting, this number defines the VMEM task-limit.
>    </description>
>  </property>
>  <property>
>    <name>mapred.task.limit.maxvmem</name>
>    <value>4096m</value>
>    <description>A number, in bytes, that represents the upper VMEM
> task-limit
>        associated with a task. Users, when specifying a VMEM task-limit for
>        their tasks, should not specify a limit which exceeds this amount.
>    </description>
>  </property>
>  <property>
>    <name>mapred.tasktracker.pmem.reserved</name>
>    <value>26624m</value>
>    <description>Physical Memory
>    </description>
> </property>

IIRC, these parameters were removed and certain new parameters were
introduced. Trunk's documentation is now updated with the exact list
of these parameters, their descriptions and usage - but I fear if the
parameter names in Hadoop 20 and trunk would have changed. Your best
bet could be to use the parameters listed in http://bit.ly/97SDz2 and
try out.

>
> Error:
> 2010-06-25 08:02:06,026 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
> start task tracker because java.io.IOException: Call to
> node1.hadoopcluster.com/192.168.1.241:9001 failed on local exception:
> java.io.IOException: Connection reset by peer
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown
> Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
>        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314)
>        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291)
>        at
> org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:514)
>        at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:934)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
> Caused by: java.io.IOException: Connection reset by peer
>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234)
>        at sun.nio.ch.IOUtil.read(IOUtil.java:207)
>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>        at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>        at