Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Memory config for Hadoop cluster


Copy link to this message
-
Re: Memory config for Hadoop cluster
Hi,

On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> Right. I meant I'm not using fair or capacity scheduler. I'm getting out of
> memory in some jobs and was trying to optimize the memory settings, number
> of tasks etc. I'm running 0.20.2.
>

The first thing most people do for this is to tweak the child.opts
setting to give higher heap space to their map or reduce tasks. I
presume you've already done this ? If not, maybe worth a try. It's by
far the easiest way to fix the out of memory errors.

> Why can't the mapred.job.map.memory.mb and mapred.job.reduce.memory.mb
> be not put in the mapred-site.xml and just default to the equivalent cluster
> baked if they are not set in the job either?

If these parameters are set in mapred-site.xml on all places - the
client, the job tracker and the task trackers and they are not being
set in the job, this should suffice. However, if they are not set on
any one of these places, they'd get submitted with the default value
of -1, and since these are job specific parameters, they would
override the preconfigured settings on the cluster. If you want to be
sure, you could mark the settings as 'final' on the job tracker and
the task trackers. Then any submission by the job would not override
the settings.

Thanks
Hemanth
>
> -Amandeep
>
> On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>
> I'm not using any scheduler.. Dont have multiple jobs running at the same
>
> time on the cluster.
>
>
> That probably means you are using the default scheduler. Please note
> that the default scheduler does not have the ability to schedule tasks
> intelligently using the memory configuration parameters you specify.
> Could you tell us what you'd like to achieve ?
>
> The documentation here: http://bit.ly/cCbAab (and the link it has to
> similar documentation in the Cluster Setup guide) will probably shed
> more light on how the parameters should be used. Note that this is in
> Hadoop 0.21, and the names of the parameters are different, though you
> can see the correspondence with similar variables in Hadoop 0.20.
>
> Thanks
> Hemanth
>
>
> -Amandeep
>
>
> On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote:
>
>
> Amadeep,
>
>
> Which scheduler are you using ?
>
>
> Thanks
>
> hemanth
>
>
> On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
>
> How are the following configs supposed to be used?
>
>
> mapred.cluster.map.memory.mb
>
> mapred.cluster.reduce.memory.mb
>
> mapred.cluster.max.map.memory.mb
>
> mapred.cluster.max.reduce.memory.mb
>
> mapred.job.map.memory.mb
>
> mapred.job.reduce.memory.mb
>
>
> These were included in 0.20 in HADOOP-5881.
>
>
> Now, here's what I'm setting only the following out of the above in my
>
> mapred-site.xml:
>
>
> mapred.cluster.map.memory.mb=896
>
> mapred.cluster.reduce.memory.mb=1024
>
>
> When I run job, I get the following error:
>
>
>
> TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is
>
> running beyond memory-limits. Current usage : 1358553088bytes. Limit :
>
> -1048576bytes. Killing task.
>
>
> I'm not sure how it got the Limit as -1048576bytes... Also, what are the
>
> cluster.max params supposed to be set as? Are they the max on the entire
>
> cluster or on a particular node?
>
>
> -Amandeep
>