|
Amandeep Khurana
2010-11-01, 21:14
Hemanth Yamijala
2010-11-05, 07:21
Amandeep Khurana
2010-11-05, 07:57
Hemanth Yamijala
2010-11-05, 08:43
Amandeep Khurana
2010-11-05, 08:53
Hemanth Yamijala
2010-11-05, 09:00
Amandeep Khurana
2010-11-05, 18:24
Hemanth Yamijala
2010-11-07, 09:19
|
-
Memory config for Hadoop clusterAmandeep Khurana 2010-11-01, 21:14
How are the following configs supposed to be used?
mapred.cluster.map.memory.mb mapred.cluster.reduce.memory.mb mapred.cluster.max.map.memory.mb mapred.cluster.max.reduce.memory.mb mapred.job.map.memory.mb mapred.job.reduce.memory.mb These were included in 0.20 in HADOOP-5881. Now, here's what I'm setting only the following out of the above in my mapred-site.xml: mapred.cluster.map.memory.mb=896 mapred.cluster.reduce.memory.mb=1024 When I run job, I get the following error: TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is running beyond memory-limits. Current usage : 1358553088bytes. Limit : -1048576bytes. Killing task. I'm not sure how it got the Limit as -1048576bytes... Also, what are the cluster.max params supposed to be set as? Are they the max on the entire cluster or on a particular node? -Amandeep
-
Re: Memory config for Hadoop clusterHemanth Yamijala 2010-11-05, 07:21
Amadeep,
Which scheduler are you using ? Thanks hemanth On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > How are the following configs supposed to be used? > > mapred.cluster.map.memory.mb > mapred.cluster.reduce.memory.mb > mapred.cluster.max.map.memory.mb > mapred.cluster.max.reduce.memory.mb > mapred.job.map.memory.mb > mapred.job.reduce.memory.mb > > These were included in 0.20 in HADOOP-5881. > > Now, here's what I'm setting only the following out of the above in my > mapred-site.xml: > > mapred.cluster.map.memory.mb=896 > mapred.cluster.reduce.memory.mb=1024 > > When I run job, I get the following error: > > > TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is > running beyond memory-limits. Current usage : 1358553088bytes. Limit : > -1048576bytes. Killing task. > > I'm not sure how it got the Limit as -1048576bytes... Also, what are the > cluster.max params supposed to be set as? Are they the max on the entire > cluster or on a particular node? > > -Amandeep >
-
Re: Memory config for Hadoop clusterAmandeep Khurana 2010-11-05, 07:57
Hemanth,
I'm not using any scheduler.. Dont have multiple jobs running at the same time on the cluster. -Amandeep On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote: > Amadeep, > > Which scheduler are you using ? > > Thanks > hemanth > > On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > > How are the following configs supposed to be used? > > > > mapred.cluster.map.memory.mb > > mapred.cluster.reduce.memory.mb > > mapred.cluster.max.map.memory.mb > > mapred.cluster.max.reduce.memory.mb > > mapred.job.map.memory.mb > > mapred.job.reduce.memory.mb > > > > These were included in 0.20 in HADOOP-5881. > > > > Now, here's what I'm setting only the following out of the above in my > > mapred-site.xml: > > > > mapred.cluster.map.memory.mb=896 > > mapred.cluster.reduce.memory.mb=1024 > > > > When I run job, I get the following error: > > > > > > TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is > > running beyond memory-limits. Current usage : 1358553088bytes. Limit : > > -1048576bytes. Killing task. > > > > I'm not sure how it got the Limit as -1048576bytes... Also, what are the > > cluster.max params supposed to be set as? Are they the max on the entire > > cluster or on a particular node? > > > > -Amandeep > > >
-
Re: Memory config for Hadoop clusterHemanth Yamijala 2010-11-05, 08:43
Hi,
> > I'm not using any scheduler.. Dont have multiple jobs running at the same > time on the cluster. That probably means you are using the default scheduler. Please note that the default scheduler does not have the ability to schedule tasks intelligently using the memory configuration parameters you specify. Could you tell us what you'd like to achieve ? The documentation here: http://bit.ly/cCbAab (and the link it has to similar documentation in the Cluster Setup guide) will probably shed more light on how the parameters should be used. Note that this is in Hadoop 0.21, and the names of the parameters are different, though you can see the correspondence with similar variables in Hadoop 0.20. Thanks Hemanth > > -Amandeep > > On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote: > >> Amadeep, >> >> Which scheduler are you using ? >> >> Thanks >> hemanth >> >> On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >> > How are the following configs supposed to be used? >> > >> > mapred.cluster.map.memory.mb >> > mapred.cluster.reduce.memory.mb >> > mapred.cluster.max.map.memory.mb >> > mapred.cluster.max.reduce.memory.mb >> > mapred.job.map.memory.mb >> > mapred.job.reduce.memory.mb >> > >> > These were included in 0.20 in HADOOP-5881. >> > >> > Now, here's what I'm setting only the following out of the above in my >> > mapred-site.xml: >> > >> > mapred.cluster.map.memory.mb=896 >> > mapred.cluster.reduce.memory.mb=1024 >> > >> > When I run job, I get the following error: >> > >> > >> > TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is >> > running beyond memory-limits. Current usage : 1358553088bytes. Limit : >> > -1048576bytes. Killing task. >> > >> > I'm not sure how it got the Limit as -1048576bytes... Also, what are the >> > cluster.max params supposed to be set as? Are they the max on the entire >> > cluster or on a particular node? >> > >> > -Amandeep >> > >> >
-
Re: Memory config for Hadoop clusterAmandeep Khurana 2010-11-05, 08:53
Right. I meant I'm not using fair or capacity scheduler. I'm getting out of
memory in some jobs and was trying to optimize the memory settings, number of tasks etc. I'm running 0.20.2. Why can't the mapred.job.map.memory.mb and mapred.job.reduce.memory.mb be not put in the mapred-site.xml and just default to the equivalent cluster baked if they are not set in the job either? -Amandeep On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote: Hi, I'm not using any scheduler.. Dont have multiple jobs running at the same time on the cluster. That probably means you are using the default scheduler. Please note that the default scheduler does not have the ability to schedule tasks intelligently using the memory configuration parameters you specify. Could you tell us what you'd like to achieve ? The documentation here: http://bit.ly/cCbAab (and the link it has to similar documentation in the Cluster Setup guide) will probably shed more light on how the parameters should be used. Note that this is in Hadoop 0.21, and the names of the parameters are different, though you can see the correspondence with similar variables in Hadoop 0.20. Thanks Hemanth -Amandeep On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote: Amadeep, Which scheduler are you using ? Thanks hemanth On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: How are the following configs supposed to be used? mapred.cluster.map.memory.mb mapred.cluster.reduce.memory.mb mapred.cluster.max.map.memory.mb mapred.cluster.max.reduce.memory.mb mapred.job.map.memory.mb mapred.job.reduce.memory.mb These were included in 0.20 in HADOOP-5881. Now, here's what I'm setting only the following out of the above in my mapred-site.xml: mapred.cluster.map.memory.mb=896 mapred.cluster.reduce.memory.mb=1024 When I run job, I get the following error: TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is running beyond memory-limits. Current usage : 1358553088bytes. Limit : -1048576bytes. Killing task. I'm not sure how it got the Limit as -1048576bytes... Also, what are the cluster.max params supposed to be set as? Are they the max on the entire cluster or on a particular node? -Amandeep
-
Re: Memory config for Hadoop clusterHemanth Yamijala 2010-11-05, 09:00
Hi,
On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > Right. I meant I'm not using fair or capacity scheduler. I'm getting out of > memory in some jobs and was trying to optimize the memory settings, number > of tasks etc. I'm running 0.20.2. > The first thing most people do for this is to tweak the child.opts setting to give higher heap space to their map or reduce tasks. I presume you've already done this ? If not, maybe worth a try. It's by far the easiest way to fix the out of memory errors. > Why can't the mapred.job.map.memory.mb and mapred.job.reduce.memory.mb > be not put in the mapred-site.xml and just default to the equivalent cluster > baked if they are not set in the job either? If these parameters are set in mapred-site.xml on all places - the client, the job tracker and the task trackers and they are not being set in the job, this should suffice. However, if they are not set on any one of these places, they'd get submitted with the default value of -1, and since these are job specific parameters, they would override the preconfigured settings on the cluster. If you want to be sure, you could mark the settings as 'final' on the job tracker and the task trackers. Then any submission by the job would not override the settings. Thanks Hemanth > > -Amandeep > > On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote: > > Hi, > > > I'm not using any scheduler.. Dont have multiple jobs running at the same > > time on the cluster. > > > That probably means you are using the default scheduler. Please note > that the default scheduler does not have the ability to schedule tasks > intelligently using the memory configuration parameters you specify. > Could you tell us what you'd like to achieve ? > > The documentation here: http://bit.ly/cCbAab (and the link it has to > similar documentation in the Cluster Setup guide) will probably shed > more light on how the parameters should be used. Note that this is in > Hadoop 0.21, and the names of the parameters are different, though you > can see the correspondence with similar variables in Hadoop 0.20. > > Thanks > Hemanth > > > -Amandeep > > > On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED]>wrote: > > > Amadeep, > > > Which scheduler are you using ? > > > Thanks > > hemanth > > > On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > > How are the following configs supposed to be used? > > > mapred.cluster.map.memory.mb > > mapred.cluster.reduce.memory.mb > > mapred.cluster.max.map.memory.mb > > mapred.cluster.max.reduce.memory.mb > > mapred.job.map.memory.mb > > mapred.job.reduce.memory.mb > > > These were included in 0.20 in HADOOP-5881. > > > Now, here's what I'm setting only the following out of the above in my > > mapred-site.xml: > > > mapred.cluster.map.memory.mb=896 > > mapred.cluster.reduce.memory.mb=1024 > > > When I run job, I get the following error: > > > > TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is > > running beyond memory-limits. Current usage : 1358553088bytes. Limit : > > -1048576bytes. Killing task. > > > I'm not sure how it got the Limit as -1048576bytes... Also, what are the > > cluster.max params supposed to be set as? Are they the max on the entire > > cluster or on a particular node? > > > -Amandeep >
-
Re: Memory config for Hadoop clusterAmandeep Khurana 2010-11-05, 18:24
On Fri, Nov 5, 2010 at 2:00 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:
> Hi, > > On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > > Right. I meant I'm not using fair or capacity scheduler. I'm getting out > of > > memory in some jobs and was trying to optimize the memory settings, > number > > of tasks etc. I'm running 0.20.2. > > > > The first thing most people do for this is to tweak the child.opts > setting to give higher heap space to their map or reduce tasks. I > presume you've already done this ? If not, maybe worth a try. It's by > far the easiest way to fix the out of memory errors. > Yup, I've done those and also played around with the number of tasks.. I've been able to get jobs to go through without errors with them but I wanted to use these configs to make sure that if a particular job is taking more memory than the cluster can afford to give. > > > Why can't the mapred.job.map.memory.mb and mapred.job.reduce.memory.mb > > be not put in the mapred-site.xml and just default to the equivalent > cluster > > baked if they are not set in the job either? > > If these parameters are set in mapred-site.xml on all places - the > client, the job tracker and the task trackers and they are not being > set in the job, this should suffice. However, if they are not set on > any one of these places, they'd get submitted with the default value > of -1, and since these are job specific parameters, they would > override the preconfigured settings on the cluster. If you want to be > sure, you could mark the settings as 'final' on the job tracker and > the task trackers. Then any submission by the job would not override > the settings. > I see the following in the TT logs: 2010-11-05 09:28:54,307 WARN org.apache.hadoop.mapred.TaskTracker (main): TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled. But the configs are present in the mapred-site.xmls all across the cluster.. The jobs are being submitted from the master node, so that takes care of the client part. I'm not sure why the configs arent getting populated. Thanks > Hemanth > > > > -Amandeep > > > > On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > > > I'm not using any scheduler.. Dont have multiple jobs running at the same > > > > time on the cluster. > > > > > > That probably means you are using the default scheduler. Please note > > that the default scheduler does not have the ability to schedule tasks > > intelligently using the memory configuration parameters you specify. > > Could you tell us what you'd like to achieve ? > > > > The documentation here: http://bit.ly/cCbAab (and the link it has to > > similar documentation in the Cluster Setup guide) will probably shed > > more light on how the parameters should be used. Note that this is in > > Hadoop 0.21, and the names of the parameters are different, though you > > can see the correspondence with similar variables in Hadoop 0.20. > > > > Thanks > > Hemanth > > > > > > -Amandeep > > > > > > On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED] > >wrote: > > > > > > Amadeep, > > > > > > Which scheduler are you using ? > > > > > > Thanks > > > > hemanth > > > > > > On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> > wrote: > > > > How are the following configs supposed to be used? > > > > > > mapred.cluster.map.memory.mb > > > > mapred.cluster.reduce.memory.mb > > > > mapred.cluster.max.map.memory.mb > > > > mapred.cluster.max.reduce.memory.mb > > > > mapred.job.map.memory.mb > > > > mapred.job.reduce.memory.mb > > > > > > These were included in 0.20 in HADOOP-5881. > > > > > > Now, here's what I'm setting only the following out of the above in my > > > > mapred-site.xml: > > > > > > mapred.cluster.map.memory.mb=896 > > > > mapred.cluster.reduce.memory.mb=1024 > > > > > > When I run job, I get the following error: > > > > > > > > TaskTree [pid=1958,tipID=attempt_201011012101_0001_m_000000_0] is
-
Re: Memory config for Hadoop clusterHemanth Yamijala 2010-11-07, 09:19
Amandeep,
On Fri, Nov 5, 2010 at 11:54 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > On Fri, Nov 5, 2010 at 2:00 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> On Fri, Nov 5, 2010 at 2:23 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >> > Right. I meant I'm not using fair or capacity scheduler. I'm getting out >> of >> > memory in some jobs and was trying to optimize the memory settings, >> number >> > of tasks etc. I'm running 0.20.2. >> > >> >> The first thing most people do for this is to tweak the child.opts >> setting to give higher heap space to their map or reduce tasks. I >> presume you've already done this ? If not, maybe worth a try. It's by >> far the easiest way to fix the out of memory errors. >> > > Yup, I've done those and also played around with the number of tasks.. I've > been able to get jobs to go through without errors with them but I wanted to > use these configs to make sure that if a particular job is taking more > memory than the cluster can afford to give. > It seems like you want to enable task memory monitoring and kill tasks that are taking up more than what is affordable. That use case can ideally be supported by Hadoop using the config options you are playing with, irrespective of the scheduler in use. Let's see if we can get this to work for you. >> >> > Why can't the mapred.job.map.memory.mb and mapred.job.reduce.memory.mb >> > be not put in the mapred-site.xml and just default to the equivalent >> cluster >> > baked if they are not set in the job either? >> >> If these parameters are set in mapred-site.xml on all places - the >> client, the job tracker and the task trackers and they are not being >> set in the job, this should suffice. However, if they are not set on >> any one of these places, they'd get submitted with the default value >> of -1, and since these are job specific parameters, they would >> override the preconfigured settings on the cluster. If you want to be >> sure, you could mark the settings as 'final' on the job tracker and >> the task trackers. Then any submission by the job would not override >> the settings. >> > > I see the following in the TT logs: > > > 2010-11-05 09:28:54,307 WARN org.apache.hadoop.mapred.TaskTracker > (main): TaskTracker's totalMemoryAllottedForTasks is -1. > TaskMemoryManager is disabled. > > But the configs are present in the mapred-site.xmls all across the cluster.. > The jobs are being submitted from the master node, so that takes care of the > client part. I'm not sure why the configs arent getting populated. > Could you paste a link to the mapred-site.xml on one of the tasktracker's nodes ? Also, I am assuming the OS is Linux ? > Thanks >> Hemanth >> > >> > -Amandeep >> > >> > On Nov 5, 2010, at 1:43 AM, Hemanth Yamijala <[EMAIL PROTECTED]> wrote: >> > >> > Hi, >> > >> > >> > I'm not using any scheduler.. Dont have multiple jobs running at the same >> > >> > time on the cluster. >> > >> > >> > That probably means you are using the default scheduler. Please note >> > that the default scheduler does not have the ability to schedule tasks >> > intelligently using the memory configuration parameters you specify. >> > Could you tell us what you'd like to achieve ? >> > >> > The documentation here: http://bit.ly/cCbAab (and the link it has to >> > similar documentation in the Cluster Setup guide) will probably shed >> > more light on how the parameters should be used. Note that this is in >> > Hadoop 0.21, and the names of the parameters are different, though you >> > can see the correspondence with similar variables in Hadoop 0.20. >> > >> > Thanks >> > Hemanth >> > >> > >> > -Amandeep >> > >> > >> > On Fri, Nov 5, 2010 at 12:21 AM, Hemanth Yamijala <[EMAIL PROTECTED] >> >wrote: >> > >> > >> > Amadeep, >> > >> > >> > Which scheduler are you using ? >> > >> > >> > Thanks >> > >> > hemanth >> > >> > >> > On Tue, Nov 2, 2010 at 2:44 AM, Amandeep Khurana <[EMAIL PROTECTED]> >> wrote: >> > >> > How are the following configs supposed to be used? |