Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Memory config for Hadoop cluster


Copy link to this message
-
Re: Memory config for Hadoop cluster
Amandeep,

On Fri, Nov 5, 2010 at 11:54 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> On Fri, Nov 5, 2010 at 2:00 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
>> > Right. I meant I'm not using fair or capacity scheduler. I'm getting out
>> of
>> > memory in some jobs and was trying to optimize the memory settings,
>> number
>> > of tasks etc. I'm running 0.20.2.
>> >
>>
>> The first thing most people do for this is to tweak the child.opts
>> setting to give higher heap space to their map or reduce tasks. I
>> presume you've already done this ? If not, maybe worth a try. It's by
>> far the easiest way to fix the out of memory errors.
>>
>
> Yup, I've done those and also played around with the number of tasks.. I've
> been able to get jobs to go through without errors with them but I wanted to
> use these configs to make sure that if a particular job is taking more
> memory than the cluster can afford to give.
>

It seems like you want to enable task memory monitoring and kill tasks
that are taking up more than what is affordable. That use case can
ideally be supported by Hadoop using the config options you are
playing with, irrespective of the scheduler in use. Let's see if we
can get this to work for you.

>>
>> > Why can't the mapred.job.map.memory.mb and mapred.job.reduce.memory.mb
>> > be not put in the mapred-site.xml and just default to the equivalent
>> cluster
>> > baked if they are not set in the job either?
>>
>> If these parameters are set in mapred-site.xml on all places - the
>> client, the job tracker and the task trackers and they are not being
>> set in the job, this should suffice. However, if they are not set on
>> any one of these places, they'd get submitted with the default value
>> of -1, and since these are job specific parameters, they would
>> override the preconfigured settings on the cluster. If you want to be
>> sure, you could mark the settings as 'final' on the job tracker and
>> the task trackers. Then any submission by the job would not override
>> the settings.
>>
>
> I see the following in the TT logs:
>
>
> 2010-11-05 09:28:54,307 WARN org.apache.hadoop.mapred.TaskTracker
> (main): TaskTracker's totalMemoryAllottedForTasks is -1.
> TaskMemoryManager is disabled.
>
> But the configs are present in the mapred-site.xmls all across the cluster..
> The jobs are being submitted from the master node, so that takes care of the
> client part. I'm not sure why the configs arent getting populated.
>

Could you paste a link to the mapred-site.xml on one of the
tasktracker's nodes ? Also, I am assuming the OS is Linux ?

> Thanks
>> Hemanth
>> >
>> > -Amandeep
>> >
>> > On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:
>> >
>> > Hi,
>> >
>> >
>> > I'm not using any scheduler.. Dont have multiple jobs running at the same
>> >
>> > time on the cluster.
>> >
>> >
>> > That probably means you are using the default scheduler. Please note
>> > that the default scheduler does not have the ability to schedule tasks
>> > intelligently using the memory configuration parameters you specify.
>> > Could you tell us what you'd like to achieve ?
>> >
>> > The documentation here: http://bit.ly/cCbAab (and the link it has to
>> > similar documentation in the Cluster Setup guide) will probably shed
>> > more light on how the parameters should be used. Note that this is in
>> > Hadoop 0.21, and the names of the parameters are different, though you
>> > can see the correspondence with similar variables in Hadoop 0.20.
>> >
>> > Thanks
>> > Hemanth
>> >
>> >
>> > -Amandeep
>> >
>> >
>> > On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >
>> > Amadeep,
>> >
>> >
>> > Which scheduler are you using ?
>> >
>> >
>> > Thanks
>> >
>> > hemanth
>> >
>> >
>> > On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > How are the following configs supposed to be used?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB