|
anil gupta
2012-07-27, 18:23
Harsh J
2012-07-27, 21:23
anil gupta
2012-07-27, 22:05
Harsh J
2012-07-27, 22:36
anil gupta
2012-07-27, 23:22
Harsh J
2012-07-27, 23:39
anil gupta
2012-07-30, 23:03
Rahul Jain
2012-07-30, 23:26
anil gupta
2012-07-30, 23:56
abhiTowson cal
2012-07-31, 02:30
anil gupta
2012-07-31, 02:47
abhiTowson cal
2012-07-31, 03:12
Rahul Jain
2012-07-31, 03:44
anil gupta
2012-07-31, 03:51
abhiTowson cal
2012-07-31, 04:21
anil gupta
2012-07-31, 18:26
anil gupta
2012-08-02, 23:25
|
-
YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-27, 18:23
Hi All,
I have a Hadoop 2.0 alpha(cdh4) hadoop/hbase cluster runnning on CentOS6.0. The cluster has 4 admin nodes and 8 data nodes. I have the RM and History server running on one machine. RM web interface shows that 8 Nodes are connected to it. I installed this cluster with HA capability and I have already tested HA for Namenodes, ZK, HBase Master. I am running the pi example mapreduce job with user "root" and i have created "/user/root" directory in HDFS. Last few lines of one of the nodemanager: 2012-07-26 21:58:38,745 INFO org.mortbay.log: Extract jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.0.0-cdh4.0.0.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp 2012-07-26 21:58:38,907 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:8042 2012-07-26 21:58:38,907 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042 2012-07-26 21:58:38,919 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules 2012-07-26 21:58:38,919 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is started. 2012-07-26 21:58:38,919 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is started. 2012-07-26 21:58:38,922 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected to ResourceManager at ihub-an-l1/172.31.192.151:8025 2012-07-26 21:58:38,924 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as ihub-dn-l2:53199 with total resource of memory: 1200 2012-07-26 21:58:38,924 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2012-07-26 21:58:38,929 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. *2012-07-26 21:58:38,929 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is stopped.* Why is the nodestatusupdaterImpl is stopped? Here is the last few lines of the RM: 2012-07-27 09:38:24,644 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 2 2012-07-27 09:38:25,310 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 2 submitted by user root 2012-07-27 09:38:25,310 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=172.31.192.51 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1343365114818_0002 2012-07-27 09:38:25,310 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1343365114818_0002 State change from NEW to SUBMITTED 2012-07-27 09:38:25,311 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1343365114818_0002_000001 2012-07-27 09:38:25,311 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1343365114818_0002_000001 State change from NEW to SUBMITTED 2012-07-27 09:38:25,311 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Application Submission: application_1343365114818_0002 from root, currently active: 1 2012-07-27 09:38:25,311 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1343365114818_0002_000001 State change from SUBMITTED to SCHEDULED 2012-07-27 09:38:25,311 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1343365114818_0002 State change from SUBMITTED to ACCEPTED The Pi example job is stuck from last 1 hour. Why it is not trying to start tasks in NM's? Here is the command i fired to run the job: [root@ihub-nn-a1 hadoop-yarn]# hadoop --config /etc/hadoop/conf/ jar /usr/lib/hadoop-mapreduce/hadoop-*-examples.jar pi 10 100000 Number of Maps = 10 Samples per Map = 100000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 12/07/27 09:38:27 INFO input.FileInputFormat: Total input paths to process 12/07/27 09:38:27 INFO mapreduce.JobSubmitter: number of splits:10 12/07/27 09:38:27 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar 12/07/27 09:38:27 WARN conf.Configuration: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 12/07/27 09:38:27 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 12/07/27 09:38:27 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 12/07/27 09:38:27 WARN conf.Configuration: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 12/07/27 09:38:27 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 12/07/27 09:38:27 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 12/07/27 09:38:27 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 12/07/27 09:38:27 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 12/07/27 09:38:27 WARN conf.Configurati
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)Harsh J 2012-07-27, 21:23
Can you share your yarn-site.xml contents? Have you tweaked memory
sizes in there? On Fri, Jul 27, 2012 at 11:53 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi All, > > I have a Hadoop 2.0 alpha(cdh4) hadoop/hbase cluster runnning on > CentOS6.0. The cluster has 4 admin nodes and 8 data nodes. I have the RM > and History server running on one machine. RM web interface shows that 8 > Nodes are connected to it. I installed this cluster with HA capability and > I have already tested HA for Namenodes, ZK, HBase Master. I am running the > pi example mapreduce job with user "root" and i have created "/user/root" > directory in HDFS. > > Last few lines of one of the nodemanager: > 2012-07-26 21:58:38,745 INFO org.mortbay.log: Extract > jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.0.0-cdh4.0.0.jar!/webapps/node > to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp > 2012-07-26 21:58:38,907 INFO org.mortbay.log: Started > SelectChannelConnector@0.0.0.0:8042 > 2012-07-26 21:58:38,907 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app > /node started at 8042 > 2012-07-26 21:58:38,919 INFO org.apache.hadoop.yarn.webapp.WebApps: > Registered webapp guice modules > 2012-07-26 21:58:38,919 INFO > org.apache.hadoop.yarn.service.AbstractService: > Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is > started. > 2012-07-26 21:58:38,919 INFO > org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is > started. > 2012-07-26 21:58:38,922 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected > to ResourceManager at ihub-an-l1/172.31.192.151:8025 > 2012-07-26 21:58:38,924 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as ihub-dn-l2:53199 with total resource of memory: 1200 > 2012-07-26 21:58:38,924 INFO > org.apache.hadoop.yarn.service.AbstractService: > Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is > started. > 2012-07-26 21:58:38,929 INFO > org.apache.hadoop.yarn.service.AbstractService: > Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. > *2012-07-26 21:58:38,929 INFO > org.apache.hadoop.yarn.service.AbstractService: > Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is > stopped.* > > Why is the nodestatusupdaterImpl is stopped? > > Here is the last few lines of the RM: > 2012-07-27 09:38:24,644 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated > new applicationId: 2 > 2012-07-27 09:38:25,310 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application > with id 2 submitted by user root > 2012-07-27 09:38:25,310 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root > IP=172.31.192.51 OPERATION=Submit Application Request > TARGET=ClientRMService RESULT=SUCCESS APPID=application_1343365114818_0002 > 2012-07-27 09:38:25,310 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: > application_1343365114818_0002 State change from NEW to SUBMITTED > 2012-07-27 09:38:25,311 INFO > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Registering appattempt_1343365114818_0002_000001 > 2012-07-27 09:38:25,311 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1343365114818_0002_000001 State change from NEW to SUBMITTED > 2012-07-27 09:38:25,311 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Application Submission: application_1343365114818_0002 from root, currently > active: 1 > 2012-07-27 09:38:25,311 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1343365114818_0002_000001 State change from SUBMITTED to > SCHEDULED > 2012-07-27 09:38:25,311 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: > application_1343365114818_0002 State change from SUBMITTED to ACCEPTED > > The Pi example job is stuck from last 1 hour. Why it is not trying to start Harsh J
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-27, 22:05
Hi Harsh,
I have set the *yarn.nodemanager.resource.memory-mb *to 1200 mb. Also, does it matters if i run the jobs as "root" while the RM service and NM service are running as "yarn" user? However, i have created the /user/root directory for root user in hdfs. Here is the yarn-site.xml: <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>/disk/yarn/local</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>/disk/yarn/logs</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/var/log/hadoop-yarn/apps</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $YARN_HOME/*,$YARN_HOME/lib/* </value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>ihub-an-l1:8025</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>ihub-an-l1:8040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>ihub-an-l1:8030</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>ihub-an-l1:8141</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>ihub-an-l1:8088</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/disk/mapred/jobhistory/intermediate/done</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/disk/mapred/jobhistory/done</value> </property> <property> <name>yarn.web-proxy.address</name> <value>ihub-an-l1:9999</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> *<property> <description>Amount of physical memory, in MB, that can be allocated for containers.</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>1200</value> </property>* </configuration> On Fri, Jul 27, 2012 at 2:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Can you share your yarn-site.xml contents? Have you tweaked memory > sizes in there? > > On Fri, Jul 27, 2012 at 11:53 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > > Hi All, > > > > I have a Hadoop 2.0 alpha(cdh4) hadoop/hbase cluster runnning on > > CentOS6.0. The cluster has 4 admin nodes and 8 data nodes. I have the RM > > and History server running on one machine. RM web interface shows that 8 > > Nodes are connected to it. I installed this cluster with HA capability > and > > I have already tested HA for Namenodes, ZK, HBase Master. I am running > the > > pi example mapreduce job with user "root" and i have created "/user/root" > > directory in HDFS. > > > > Last few lines of one of the nodemanager: > > 2012-07-26 21:58:38,745 INFO org.mortbay.log: Extract > > > jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.0.0-cdh4.0.0.jar!/webapps/node > > to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp > > 2012-07-26 21:58:38,907 INFO org.mortbay.log: Started > > SelectChannelConnector@0.0.0.0:8042 > > 2012-07-26 21:58:38,907 INFO org.apache.hadoop.yarn.webapp.WebApps: Web Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)Harsh J 2012-07-27, 22:36
Hi,
The 'root' doesn't matter. You may run jobs as any username on an unsecured cluster, should be just the same. The config yarn.nodemanager.resource.memory-mb = 1200 is your issue. By default, the tasks will execute with a resource demand of 1 GB, and the AM itself demands, by default, 1.5 GB to run. None of your nodes are hence able to start your AM (demand=1500mb) and hence if the AM doesn't start, your job won't initiate either. You can do a few things: 1. Raise yarn.nodemanager.resource.memory-mb to a value close to 4 GB perhaps, if you have the RAM? Think of it as the new 'slots' divider. The larger the offering (close to total RAM you can offer for containers from the machine), the more the tasks that may run on it (depending on their own demand, of course). Reboot the NM's one by one and this app will begin to execute. 2. Lower the AM's requirement, i.e. lower yarn.app.mapreduce.am.resource.mb in your client's mapred-site.xml or job config from 1500 to 1000 or less, so it fits in the NM's offering. Likewise, control the map and reduce's requests via mapreduce.map.memory.mb and mapreduce.reduce.memory.mb as needed. Resubmit the job with these lowered requirements and things should now work. Optionally, you may also cap the max/min possible requests via "yarn.scheduler.minimum-allocation-mb" and "yarn.scheduler.maximum-allocation-mb", such that no app/job ends up demanding more than a certain limit and hence run into the 'forever-waiting' state as in your case. Hope this helps! For some communication diagrams on how an app (such as MR2, etc.) may work on YARN and how the resource negotiation works, you can check out this post from Ahmed at http://www.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/ On Sat, Jul 28, 2012 at 3:35 AM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > I have set the *yarn.nodemanager.resource.memory-mb *to 1200 mb. Also, does > it matters if i run the jobs as "root" while the RM service and NM service > are running as "yarn" user? However, i have created the /user/root > directory for root user in hdfs. > > Here is the yarn-site.xml: > <configuration> > <property> > <name>yarn.nodemanager.aux-services</name> > <value>mapreduce.shuffle</value> > </property> > > <property> > <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> > <value>org.apache.hadoop.mapred.ShuffleHandler</value> > </property> > > <property> > <name>yarn.log-aggregation-enable</name> > <value>true</value> > </property> > > <property> > <description>List of directories to store localized files > in.</description> > <name>yarn.nodemanager.local-dirs</name> > <value>/disk/yarn/local</value> > </property> > > <property> > <description>Where to store container logs.</description> > <name>yarn.nodemanager.log-dirs</name> > <value>/disk/yarn/logs</value> > </property> > > <property> > <description>Where to aggregate logs to.</description> > <name>yarn.nodemanager.remote-app-log-dir</name> > <value>/var/log/hadoop-yarn/apps</value> > </property> > > <property> > <description>Classpath for typical applications.</description> > <name>yarn.application.classpath</name> > <value> > $HADOOP_CONF_DIR, > $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, > $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, > $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, > $YARN_HOME/*,$YARN_HOME/lib/* > </value> > </property> > <property> > <name>yarn.resourcemanager.resource-tracker.address</name> > <value>ihub-an-l1:8025</value> > </property> > <property> > <name>yarn.resourcemanager.address</name> > <value>ihub-an-l1:8040</value> > </property> > <property> > <name>yarn.resourcemanager.scheduler.address</name> > <value>ihub-an-l1:8030</value> > </property> > <property> > <name>yarn.resourcemanager.admin.address</name> > <value>ihub-an-l1:8141</value> Harsh J
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-27, 23:22
Hi Harsh,
Thanks a lot for your response. I am going to try your suggestions and let you know the outcome. I am running the cluster on VMWare hypervisor. I have 3 physical machines with 16GB of RAM, and 4TB( 2 HD of 2TB each). On every machine i am running 4 VM's. Each VM is having 3.2 GB of memory. I built this cluster for trying out HA(NN, ZK, HMaster) since we are little reluctant to deploy anything without HA in prod. This cluster is supposed to be used as HBase cluster and MR is going to be used only for Bulk Loading. Also, my data dump is around 10 GB(which is pretty small for Hadoop). I am going to load this data in 4 different schema which will be roughly 150 million records for HBase. So, i think i will lower down the memory requirement of Yarn for my use case rather than reducing the number of data nodes to increase the memory of remaining Data Nodes. Do you think this will be the right approach for my cluster environment? Also, on a side note, shouldn't the NodeManager throw an error on this kind of memory problem? Should i file a JIRA for this? It just sat quietly over there. Thanks a lot, Anil Gupta On Fri, Jul 27, 2012 at 3:36 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Hi, > > The 'root' doesn't matter. You may run jobs as any username on an > unsecured cluster, should be just the same. > > The config yarn.nodemanager.resource.memory-mb = 1200 is your issue. > By default, the tasks will execute with a resource demand of 1 GB, and > the AM itself demands, by default, 1.5 GB to run. None of your nodes > are hence able to start your AM (demand=1500mb) and hence if the AM > doesn't start, your job won't initiate either. > > You can do a few things: > > 1. Raise yarn.nodemanager.resource.memory-mb to a value close to 4 GB > perhaps, if you have the RAM? Think of it as the new 'slots' divider. > The larger the offering (close to total RAM you can offer for > containers from the machine), the more the tasks that may run on it > (depending on their own demand, of course). Reboot the NM's one by one > and this app will begin to execute. > 2. Lower the AM's requirement, i.e. lower > yarn.app.mapreduce.am.resource.mb in your client's mapred-site.xml or > job config from 1500 to 1000 or less, so it fits in the NM's offering. > Likewise, control the map and reduce's requests via > mapreduce.map.memory.mb and mapreduce.reduce.memory.mb as needed. > Resubmit the job with these lowered requirements and things should now > work. > > Optionally, you may also cap the max/min possible requests via > "yarn.scheduler.minimum-allocation-mb" and > "yarn.scheduler.maximum-allocation-mb", such that no app/job ends up > demanding more than a certain limit and hence run into the > 'forever-waiting' state as in your case. > > Hope this helps! For some communication diagrams on how an app (such > as MR2, etc.) may work on YARN and how the resource negotiation works, > you can check out this post from Ahmed at > http://www.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/ > > On Sat, Jul 28, 2012 at 3:35 AM, anil gupta <[EMAIL PROTECTED]> wrote: > > Hi Harsh, > > > > I have set the *yarn.nodemanager.resource.memory-mb *to 1200 mb. Also, > does > > it matters if i run the jobs as "root" while the RM service and NM > service > > are running as "yarn" user? However, i have created the /user/root > > directory for root user in hdfs. > > > > Here is the yarn-site.xml: > > <configuration> > > <property> > > <name>yarn.nodemanager.aux-services</name> > > <value>mapreduce.shuffle</value> > > </property> > > > > <property> > > <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> > > <value>org.apache.hadoop.mapred.ShuffleHandler</value> > > </property> > > > > <property> > > <name>yarn.log-aggregation-enable</name> > > <value>true</value> > > </property> > > > > <property> > > <description>List of directories to store localized files > > in.</description> > > <name>yarn.nodemanager.local-dirs</name> Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)Harsh J 2012-07-27, 23:39
I think its alright if we may fail the app if it requests what is
impossible, rather than log or wait for an admin to come along and fix it in runtime. Please do file a JIRA. The max allocation value can perhaps also be dynamically set to the maximum offered RAM value across the NMs that are live, or a fraction of it? That is what caused this hang in the first place (by letting it go in as a valid request, since default max alloc is about 10 GB). On Sat, Jul 28, 2012 at 4:52 AM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > Thanks a lot for your response. I am going to try your suggestions and let > you know the outcome. > I am running the cluster on VMWare hypervisor. I have 3 physical machines > with 16GB of RAM, and 4TB( 2 HD of 2TB each). On every machine i am running > 4 VM's. Each VM is having 3.2 GB of memory. I built this cluster for trying > out HA(NN, ZK, HMaster) since we are little reluctant to deploy anything > without HA in prod. > This cluster is supposed to be used as HBase cluster and MR is going to be > used only for Bulk Loading. Also, my data dump is around 10 GB(which is > pretty small for Hadoop). I am going to load this data in 4 different > schema which will be roughly 150 million records for HBase. > So, i think i will lower down the memory requirement of Yarn for my use > case rather than reducing the number of data nodes to increase the memory > of remaining Data Nodes. Do you think this will be the right approach for > my cluster environment? > Also, on a side note, shouldn't the NodeManager throw an error on this kind > of memory problem? Should i file a JIRA for this? It just sat quietly over > there. > > Thanks a lot, > Anil Gupta > > On Fri, Jul 27, 2012 at 3:36 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> The 'root' doesn't matter. You may run jobs as any username on an >> unsecured cluster, should be just the same. >> >> The config yarn.nodemanager.resource.memory-mb = 1200 is your issue. >> By default, the tasks will execute with a resource demand of 1 GB, and >> the AM itself demands, by default, 1.5 GB to run. None of your nodes >> are hence able to start your AM (demand=1500mb) and hence if the AM >> doesn't start, your job won't initiate either. >> >> You can do a few things: >> >> 1. Raise yarn.nodemanager.resource.memory-mb to a value close to 4 GB >> perhaps, if you have the RAM? Think of it as the new 'slots' divider. >> The larger the offering (close to total RAM you can offer for >> containers from the machine), the more the tasks that may run on it >> (depending on their own demand, of course). Reboot the NM's one by one >> and this app will begin to execute. >> 2. Lower the AM's requirement, i.e. lower >> yarn.app.mapreduce.am.resource.mb in your client's mapred-site.xml or >> job config from 1500 to 1000 or less, so it fits in the NM's offering. >> Likewise, control the map and reduce's requests via >> mapreduce.map.memory.mb and mapreduce.reduce.memory.mb as needed. >> Resubmit the job with these lowered requirements and things should now >> work. >> >> Optionally, you may also cap the max/min possible requests via >> "yarn.scheduler.minimum-allocation-mb" and >> "yarn.scheduler.maximum-allocation-mb", such that no app/job ends up >> demanding more than a certain limit and hence run into the >> 'forever-waiting' state as in your case. >> >> Hope this helps! For some communication diagrams on how an app (such >> as MR2, etc.) may work on YARN and how the resource negotiation works, >> you can check out this post from Ahmed at >> http://www.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/ >> >> On Sat, Jul 28, 2012 at 3:35 AM, anil gupta <[EMAIL PROTECTED]> wrote: >> > Hi Harsh, >> > >> > I have set the *yarn.nodemanager.resource.memory-mb *to 1200 mb. Also, >> does >> > it matters if i run the jobs as "root" while the RM service and NM >> service >> > are running as "yarn" user? However, i have created the /user/root >> > directory for root user in hdfs. Harsh J
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-30, 23:03
Hi Harsh,
I modified the mapred-site.xml and yarn-site so that MR jobs can run in 1.2 Gb of memory. Here is the mapred-site.xml: http://pastebin.com/Fxjie6kg and yarn-site.xml: http://pastebin.com/TCJuDAhe. After updating the conf the MR jobs seemingly start map processes but the job fails at 0%. In the web page of http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe page says: Failed redirect for container_1343687008058_0003_01_000001 Failed while trying to construct the redirect url to the log server. Log Server url may not be configured. Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. Do you have any idea about this problem? I searched on internet and i got this discussion on cdh forum( https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/AwCRuaPm7e0) but no resolution was posted over there. Thanks, Anil On Fri, Jul 27, 2012 at 4:39 PM, Harsh J <[EMAIL PROTECTED]> wrote: > I think its alright if we may fail the app if it requests what is > impossible, rather than log or wait for an admin to come along and fix > it in runtime. Please do file a JIRA. > > The max allocation value can perhaps also be dynamically set to the > maximum offered RAM value across the NMs that are live, or a fraction > of it? That is what caused this hang in the first place (by letting it > go in as a valid request, since default max alloc is about 10 GB). > > On Sat, Jul 28, 2012 at 4:52 AM, anil gupta <[EMAIL PROTECTED]> wrote: > > Hi Harsh, > > > > Thanks a lot for your response. I am going to try your suggestions and > let > > you know the outcome. > > I am running the cluster on VMWare hypervisor. I have 3 physical machines > > with 16GB of RAM, and 4TB( 2 HD of 2TB each). On every machine i am > running > > 4 VM's. Each VM is having 3.2 GB of memory. I built this cluster for > trying > > out HA(NN, ZK, HMaster) since we are little reluctant to deploy anything > > without HA in prod. > > This cluster is supposed to be used as HBase cluster and MR is going to > be > > used only for Bulk Loading. Also, my data dump is around 10 GB(which is > > pretty small for Hadoop). I am going to load this data in 4 different > > schema which will be roughly 150 million records for HBase. > > So, i think i will lower down the memory requirement of Yarn for my use > > case rather than reducing the number of data nodes to increase the memory > > of remaining Data Nodes. Do you think this will be the right approach for > > my cluster environment? > > Also, on a side note, shouldn't the NodeManager throw an error on this > kind > > of memory problem? Should i file a JIRA for this? It just sat quietly > over > > there. > > > > Thanks a lot, > > Anil Gupta > > > > On Fri, Jul 27, 2012 at 3:36 PM, Harsh J <[EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> The 'root' doesn't matter. You may run jobs as any username on an > >> unsecured cluster, should be just the same. > >> > >> The config yarn.nodemanager.resource.memory-mb = 1200 is your issue. > >> By default, the tasks will execute with a resource demand of 1 GB, and > >> the AM itself demands, by default, 1.5 GB to run. None of your nodes > >> are hence able to start your AM (demand=1500mb) and hence if the AM > >> doesn't start, your job won't initiate either. > >> > >> You can do a few things: > >> > >> 1. Raise yarn.nodemanager.resource.memory-mb to a value close to 4 GB > >> perhaps, if you have the RAM? Think of it as the new 'slots' divider. > >> The larger the offering (close to total RAM you can offer for > >> containers from the machine), the more the tasks that may run on it > >> (depending on their own demand, of course). Reboot the NM's one by one > >> and this app will begin to execute. > >> 2. Lower the AM's requirement, i.e. lower > >> yarn.app.mapreduce.am.resource.mb in your client's mapred-site.xml or > >> job config from 1500 to 1000 or less, so it fits in the NM's offering. Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)Rahul Jain 2012-07-30, 23:26
The inability to look at map-reduce logs for failed logs is due to number
of open issues in yarn; see my recent comment here: https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 You can workaround this by enabling log aggregation and manually copying job logs from HDFS log location. Of course that is a painful way till the yarn log collection and history bugs are resolved in an upcoming release. -Rahul > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted application > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ > 172.31.192.151:8040 > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: job_1343365114818_0002 > > No Map-Reduce task are started by the cluster. I dont see any errors > anywhere in the application. Please help me in resolving this problem. > > Thanks, > Anil Gupta >
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-30, 23:56
Hi Rahul,
Thanks for your response. I can certainly enable the yarn.log-aggregation-enable to true. But after enabling this what manual steps i will have to take to run jobs. Could you please elaborate. Thanks, Anil On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: > The inability to look at map-reduce logs for failed logs is due to number > of open issues in yarn; see my recent comment here: > > https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 > > You can workaround this by enabling log aggregation and manually copying > job logs from HDFS log location. Of course that is a painful way till the > yarn log collection and history bugs are resolved in an upcoming release. > > -Rahul > > > > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted application > > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ > > 172.31.192.151:8040 > > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: > > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ > > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: job_1343365114818_0002 > > > > No Map-Reduce task are started by the cluster. I dont see any errors > > anywhere in the application. Please help me in resolving this problem. > > > > Thanks, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)abhiTowson cal 2012-07-31, 02:30
hi anil,
Adding these help me resolve the issue for me yarn.resourcemanager.resource-tracker.address Regards Abhishek On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Rahul, > > Thanks for your response. I can certainly enable the > yarn.log-aggregation-enable to true. But after enabling this what manual > steps i will have to take to run jobs. Could you please elaborate. > > Thanks, > Anil > > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: > >> The inability to look at map-reduce logs for failed logs is due to number >> of open issues in yarn; see my recent comment here: >> >> https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 >> >> You can workaround this by enabling log aggregation and manually copying >> job logs from HDFS log location. Of course that is a painful way till the >> yarn log collection and history bugs are resolved in an upcoming release. >> >> -Rahul >> >> >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted application >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ >> > 172.31.192.151:8040 >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: job_1343365114818_0002 >> > >> > No Map-Reduce task are started by the cluster. I dont see any errors >> > anywhere in the application. Please help me in resolving this problem. >> > >> > Thanks, >> > Anil Gupta >> > >> > > > > -- > Thanks & Regards, > Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-31, 02:47
Hi Abhishek,
Did you mean that adding yarn.resourcemanager.resource-tracker.address along with yarn.log-aggregation-enable in my configuration will resolve the problem in which map-reduce job fails at 0% with the following error: In the web page of http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe page says: Failed redirect for container_1343687008058_0003_01_000001 Failed while trying to construct the redirect url to the log server. Log Server url may not be configured. Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. Please let me know. Thanks, Anil Gupta On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal <[EMAIL PROTECTED]>wrote: > hi anil, > > Adding these help me resolve the issue for me > yarn.resourcemanager.resource-tracker.address > > Regards > Abhishek > > On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> wrote: > > Hi Rahul, > > > > Thanks for your response. I can certainly enable the > > yarn.log-aggregation-enable to true. But after enabling this what manual > > steps i will have to take to run jobs. Could you please elaborate. > > > > Thanks, > > Anil > > > > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: > > > >> The inability to look at map-reduce logs for failed logs is due to > number > >> of open issues in yarn; see my recent comment here: > >> > >> > https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 > >> > >> You can workaround this by enabling log aggregation and manually copying > >> job logs from HDFS log location. Of course that is a painful way till > the > >> yarn log collection and history bugs are resolved in an upcoming > release. > >> > >> -Rahul > >> > >> > >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted > application > >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ > >> > 172.31.192.151:8040 > >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: > >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ > >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: > job_1343365114818_0002 > >> > > >> > No Map-Reduce task are started by the cluster. I dont see any errors > >> > anywhere in the application. Please help me in resolving this problem. > >> > > >> > Thanks, > >> > Anil Gupta > >> > > >> > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > -- Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)abhiTowson cal 2012-07-31, 03:12
Hi anil,
Adding property resolved issue for me, and i also made this change vim hadoop-env.sh export JAVA_HOME=/usr/lib/java-1.6.0/jdk1.6.0_33 if [ "$JAVA_HOME" != "" ]; then #echo "run java in $JAVA_HOME" JAVA_HOME=$JAVA_HOME fi if [ "$JAVA_HOME" = "" ]; then echo "Error: JAVA_HOME is not set." exit 1 fi JAVA=$JAVA_HOME/bin/java JAVA_HEAP_MAX=-Xmx1000m Regards Abhishek On Mon, Jul 30, 2012 at 10:47 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Abhishek, > > Did you mean that adding yarn.resourcemanager.resource-tracker.address > along with yarn.log-aggregation-enable in my configuration will resolve the > problem in which map-reduce job fails at 0% with the following error: In > the web page of > http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe > page says: > Failed redirect for container_1343687008058_0003_01_000001 Failed while > trying to construct the redirect url to the log server. Log Server url may > not be configured. Unknown container. Container either has not started or > has already completed or doesn't belong to this node at all. > Please let me know. > > Thanks, > Anil Gupta > > On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal > <[EMAIL PROTECTED]>wrote: > >> hi anil, >> >> Adding these help me resolve the issue for me >> yarn.resourcemanager.resource-tracker.address >> >> Regards >> Abhishek >> >> On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> wrote: >> > Hi Rahul, >> > >> > Thanks for your response. I can certainly enable the >> > yarn.log-aggregation-enable to true. But after enabling this what manual >> > steps i will have to take to run jobs. Could you please elaborate. >> > >> > Thanks, >> > Anil >> > >> > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: >> > >> >> The inability to look at map-reduce logs for failed logs is due to >> number >> >> of open issues in yarn; see my recent comment here: >> >> >> >> >> https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 >> >> >> >> You can workaround this by enabling log aggregation and manually copying >> >> job logs from HDFS log location. Of course that is a painful way till >> the >> >> yarn log collection and history bugs are resolved in an upcoming >> release. >> >> >> >> -Rahul >> >> >> >> >> >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted >> application >> >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ >> >> > 172.31.192.151:8040 >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: >> >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: >> job_1343365114818_0002 >> >> > >> >> > No Map-Reduce task are started by the cluster. I dont see any errors >> >> > anywhere in the application. Please help me in resolving this problem. >> >> > >> >> > Thanks, >> >> > Anil Gupta >> >> > >> >> >> > >> > >> > >> > -- >> > Thanks & Regards, >> > Anil Gupta >> > > > > -- > Thanks & Regards, > Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)Rahul Jain 2012-07-31, 03:44
Yes, do ensure that JAVA_HOME is properly set on all nodes through hadoop
env or the login shell. Regarding agregated logs, they will be found in hdfs under the directory set through 'yarn.nodemanager.remote-app-log-dir' , a subdirectory for each job. -Rahul On Mon, Jul 30, 2012 at 8:12 PM, abhiTowson cal <[EMAIL PROTECTED]>wrote: > Hi anil, > > Adding property resolved issue for me, and i also made this change > > vim hadoop-env.sh > > export JAVA_HOME=/usr/lib/java-1.6.0/jdk1.6.0_33 > if [ "$JAVA_HOME" != "" ]; then > #echo "run java in $JAVA_HOME" > JAVA_HOME=$JAVA_HOME > fi > > if [ "$JAVA_HOME" = "" ]; then > echo "Error: JAVA_HOME is not set." > exit 1 > fi > > JAVA=$JAVA_HOME/bin/java > JAVA_HEAP_MAX=-Xmx1000m > > Regards > Abhishek > > > On Mon, Jul 30, 2012 at 10:47 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > > Hi Abhishek, > > > > Did you mean that adding yarn.resourcemanager.resource-tracker.address > > along with yarn.log-aggregation-enable in my configuration will resolve > the > > problem in which map-reduce job fails at 0% with the following error: In > > the web page of > > > http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe > > page says: > > Failed redirect for container_1343687008058_0003_01_000001 Failed while > > trying to construct the redirect url to the log server. Log Server url > may > > not be configured. Unknown container. Container either has not started or > > has already completed or doesn't belong to this node at all. > > Please let me know. > > > > Thanks, > > Anil Gupta > > > > On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal > > <[EMAIL PROTECTED]>wrote: > > > >> hi anil, > >> > >> Adding these help me resolve the issue for me > >> yarn.resourcemanager.resource-tracker.address > >> > >> Regards > >> Abhishek > >> > >> On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > >> > Hi Rahul, > >> > > >> > Thanks for your response. I can certainly enable the > >> > yarn.log-aggregation-enable to true. But after enabling this what > manual > >> > steps i will have to take to run jobs. Could you please elaborate. > >> > > >> > Thanks, > >> > Anil > >> > > >> > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: > >> > > >> >> The inability to look at map-reduce logs for failed logs is due to > >> number > >> >> of open issues in yarn; see my recent comment here: > >> >> > >> >> > >> > https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 > >> >> > >> >> You can workaround this by enabling log aggregation and manually > copying > >> >> job logs from HDFS log location. Of course that is a painful way till > >> the > >> >> yarn log collection and history bugs are resolved in an upcoming > >> release. > >> >> > >> >> -Rahul > >> >> > >> >> > >> >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted > >> application > >> >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ > >> >> > 172.31.192.151:8040 > >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: > >> >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ > >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: > >> job_1343365114818_0002 > >> >> > > >> >> > No Map-Reduce task are started by the cluster. I dont see any > errors > >> >> > anywhere in the application. Please help me in resolving this > problem. > >> >> > > >> >> > Thanks, > >> >> > Anil Gupta > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Thanks & Regards, > >> > Anil Gupta > >> > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta >
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-31, 03:51
That's pretty interesting!! From where you figured out that you need to add
that property? Just trying to understand how adding that property fixed the issue. On Mon, Jul 30, 2012 at 8:12 PM, abhiTowson cal <[EMAIL PROTECTED]>wrote: > Hi anil, > > Adding property resolved issue for me, and i also made this change > > vim hadoop-env.sh > > export JAVA_HOME=/usr/lib/java-1.6.0/jdk1.6.0_33 > if [ "$JAVA_HOME" != "" ]; then > #echo "run java in $JAVA_HOME" > JAVA_HOME=$JAVA_HOME > fi > > if [ "$JAVA_HOME" = "" ]; then > echo "Error: JAVA_HOME is not set." > exit 1 > fi > > JAVA=$JAVA_HOME/bin/java > JAVA_HEAP_MAX=-Xmx1000m > > Regards > Abhishek > > > On Mon, Jul 30, 2012 at 10:47 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > > Hi Abhishek, > > > > Did you mean that adding yarn.resourcemanager.resource-tracker.address > > along with yarn.log-aggregation-enable in my configuration will resolve > the > > problem in which map-reduce job fails at 0% with the following error: In > > the web page of > > > http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe > > page says: > > Failed redirect for container_1343687008058_0003_01_000001 Failed while > > trying to construct the redirect url to the log server. Log Server url > may > > not be configured. Unknown container. Container either has not started or > > has already completed or doesn't belong to this node at all. > > Please let me know. > > > > Thanks, > > Anil Gupta > > > > On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal > > <[EMAIL PROTECTED]>wrote: > > > >> hi anil, > >> > >> Adding these help me resolve the issue for me > >> yarn.resourcemanager.resource-tracker.address > >> > >> Regards > >> Abhishek > >> > >> On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > >> > Hi Rahul, > >> > > >> > Thanks for your response. I can certainly enable the > >> > yarn.log-aggregation-enable to true. But after enabling this what > manual > >> > steps i will have to take to run jobs. Could you please elaborate. > >> > > >> > Thanks, > >> > Anil > >> > > >> > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: > >> > > >> >> The inability to look at map-reduce logs for failed logs is due to > >> number > >> >> of open issues in yarn; see my recent comment here: > >> >> > >> >> > >> > https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 > >> >> > >> >> You can workaround this by enabling log aggregation and manually > copying > >> >> job logs from HDFS log location. Of course that is a painful way till > >> the > >> >> yarn log collection and history bugs are resolved in an upcoming > >> release. > >> >> > >> >> -Rahul > >> >> > >> >> > >> >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted > >> application > >> >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ > >> >> > 172.31.192.151:8040 > >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: > >> >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ > >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: > >> job_1343365114818_0002 > >> >> > > >> >> > No Map-Reduce task are started by the cluster. I dont see any > errors > >> >> > anywhere in the application. Please help me in resolving this > problem. > >> >> > > >> >> > Thanks, > >> >> > Anil Gupta > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Thanks & Regards, > >> > Anil Gupta > >> > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > -- Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)abhiTowson cal 2012-07-31, 04:21
Hi anil,
Was trying several things.I didn't hadoop-env.sh, so i created it. Regards Abhishek On Mon, Jul 30, 2012 at 11:51 PM, anil gupta <[EMAIL PROTECTED]> wrote: > That's pretty interesting!! From where you figured out that you need to add > that property? Just trying to understand how adding that property fixed the > issue. > > On Mon, Jul 30, 2012 at 8:12 PM, abhiTowson cal > <[EMAIL PROTECTED]>wrote: > >> Hi anil, >> >> Adding property resolved issue for me, and i also made this change >> >> vim hadoop-env.sh >> >> export JAVA_HOME=/usr/lib/java-1.6.0/jdk1.6.0_33 >> if [ "$JAVA_HOME" != "" ]; then >> #echo "run java in $JAVA_HOME" >> JAVA_HOME=$JAVA_HOME >> fi >> >> if [ "$JAVA_HOME" = "" ]; then >> echo "Error: JAVA_HOME is not set." >> exit 1 >> fi >> >> JAVA=$JAVA_HOME/bin/java >> JAVA_HEAP_MAX=-Xmx1000m >> >> Regards >> Abhishek >> >> >> On Mon, Jul 30, 2012 at 10:47 PM, anil gupta <[EMAIL PROTECTED]> >> wrote: >> > Hi Abhishek, >> > >> > Did you mean that adding yarn.resourcemanager.resource-tracker.address >> > along with yarn.log-aggregation-enable in my configuration will resolve >> the >> > problem in which map-reduce job fails at 0% with the following error: In >> > the web page of >> > >> http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe >> > page says: >> > Failed redirect for container_1343687008058_0003_01_000001 Failed while >> > trying to construct the redirect url to the log server. Log Server url >> may >> > not be configured. Unknown container. Container either has not started or >> > has already completed or doesn't belong to this node at all. >> > Please let me know. >> > >> > Thanks, >> > Anil Gupta >> > >> > On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal >> > <[EMAIL PROTECTED]>wrote: >> > >> >> hi anil, >> >> >> >> Adding these help me resolve the issue for me >> >> yarn.resourcemanager.resource-tracker.address >> >> >> >> Regards >> >> Abhishek >> >> >> >> On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> >> wrote: >> >> > Hi Rahul, >> >> > >> >> > Thanks for your response. I can certainly enable the >> >> > yarn.log-aggregation-enable to true. But after enabling this what >> manual >> >> > steps i will have to take to run jobs. Could you please elaborate. >> >> > >> >> > Thanks, >> >> > Anil >> >> > >> >> > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> wrote: >> >> > >> >> >> The inability to look at map-reduce logs for failed logs is due to >> >> number >> >> >> of open issues in yarn; see my recent comment here: >> >> >> >> >> >> >> >> >> https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 >> >> >> >> >> >> You can workaround this by enabling log aggregation and manually >> copying >> >> >> job logs from HDFS log location. Of course that is a painful way till >> >> the >> >> >> yarn log collection and history bugs are resolved in an upcoming >> >> release. >> >> >> >> >> >> -Rahul >> >> >> >> >> >> >> >> >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted >> >> application >> >> >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ >> >> >> > 172.31.192.151:8040 >> >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: >> >> >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ >> >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: >> >> job_1343365114818_0002 >> >> >> > >> >> >> > No Map-Reduce task are started by the cluster. I dont see any >> errors >> >> >> > anywhere in the application. Please help me in resolving this >> problem. >> >> >> > >> >> >> > Thanks, >> >> >> > Anil Gupta >> >> >> > >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Thanks & Regards, >> >> > Anil Gupta >> >> >> > >> > >> > >> > -- >> > Thanks & Regards, >> > Anil Gupta >> > > > > -- > Thanks & Regards, > Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-07-31, 18:26
Hi Harsh and Others,
I was able to run the job when I login as user "hdfs". However, it fails if i run it as "root". I was suspecting this as a problem before also and it came out to be true. Thanks, Anil gupta On Mon, Jul 30, 2012 at 9:21 PM, abhiTowson cal <[EMAIL PROTECTED]>wrote: > Hi anil, > > Was trying several things.I didn't hadoop-env.sh, so i created it. > > Regards > Abhishek > > On Mon, Jul 30, 2012 at 11:51 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > > That's pretty interesting!! From where you figured out that you need to > add > > that property? Just trying to understand how adding that property fixed > the > > issue. > > > > On Mon, Jul 30, 2012 at 8:12 PM, abhiTowson cal > > <[EMAIL PROTECTED]>wrote: > > > >> Hi anil, > >> > >> Adding property resolved issue for me, and i also made this change > >> > >> vim hadoop-env.sh > >> > >> export JAVA_HOME=/usr/lib/java-1.6.0/jdk1.6.0_33 > >> if [ "$JAVA_HOME" != "" ]; then > >> #echo "run java in $JAVA_HOME" > >> JAVA_HOME=$JAVA_HOME > >> fi > >> > >> if [ "$JAVA_HOME" = "" ]; then > >> echo "Error: JAVA_HOME is not set." > >> exit 1 > >> fi > >> > >> JAVA=$JAVA_HOME/bin/java > >> JAVA_HEAP_MAX=-Xmx1000m > >> > >> Regards > >> Abhishek > >> > >> > >> On Mon, Jul 30, 2012 at 10:47 PM, anil gupta <[EMAIL PROTECTED]> > >> wrote: > >> > Hi Abhishek, > >> > > >> > Did you mean that adding yarn.resourcemanager.resource-tracker.address > >> > along with yarn.log-aggregation-enable in my configuration will > resolve > >> the > >> > problem in which map-reduce job fails at 0% with the following error: > In > >> > the web page of > >> > > >> > http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe > >> > page says: > >> > Failed redirect for container_1343687008058_0003_01_000001 Failed > while > >> > trying to construct the redirect url to the log server. Log Server url > >> may > >> > not be configured. Unknown container. Container either has not > started or > >> > has already completed or doesn't belong to this node at all. > >> > Please let me know. > >> > > >> > Thanks, > >> > Anil Gupta > >> > > >> > On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal > >> > <[EMAIL PROTECTED]>wrote: > >> > > >> >> hi anil, > >> >> > >> >> Adding these help me resolve the issue for me > >> >> yarn.resourcemanager.resource-tracker.address > >> >> > >> >> Regards > >> >> Abhishek > >> >> > >> >> On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> > >> wrote: > >> >> > Hi Rahul, > >> >> > > >> >> > Thanks for your response. I can certainly enable the > >> >> > yarn.log-aggregation-enable to true. But after enabling this what > >> manual > >> >> > steps i will have to take to run jobs. Could you please elaborate. > >> >> > > >> >> > Thanks, > >> >> > Anil > >> >> > > >> >> > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> > wrote: > >> >> > > >> >> >> The inability to look at map-reduce logs for failed logs is due to > >> >> number > >> >> >> of open issues in yarn; see my recent comment here: > >> >> >> > >> >> >> > >> >> > >> > https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 > >> >> >> > >> >> >> You can workaround this by enabling log aggregation and manually > >> copying > >> >> >> job logs from HDFS log location. Of course that is a painful way > till > >> >> the > >> >> >> yarn log collection and history bugs are resolved in an upcoming > >> >> release. > >> >> >> > >> >> >> -Rahul > >> >> >> > >> >> >> > >> >> >> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted > >> >> application > >> >> >> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/ > >> >> >> > 172.31.192.151:8040 > >> >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job: > >> >> >> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/ > >> >> >> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: Thanks & Regards, Anil Gupta
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)anil gupta 2012-08-02, 23:25
Hi Harsh,
I created the following JIRA for NodeManager not reporting appropriate errors when NM Memory < AM Memory: MAPREDUCE-4508 - YARN needs to properly check the NM,AM memory properties in yarn-site.xml and mapred.xml and report errors accordingly.<https://issues.apache.org/jira/browse/MAPREDUCE-4508> Please let me know if anything else if required for the JIRA. Thanks, Anil Gupta On Tue, Jul 31, 2012 at 11:26 AM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Harsh and Others, > > I was able to run the job when I login as user "hdfs". However, it fails > if i run it as "root". I was suspecting this as a problem before also and > it came out to be true. > > Thanks, > Anil gupta > > > On Mon, Jul 30, 2012 at 9:21 PM, abhiTowson cal <[EMAIL PROTECTED] > > wrote: > >> Hi anil, >> >> Was trying several things.I didn't hadoop-env.sh, so i created it. >> >> Regards >> Abhishek >> >> On Mon, Jul 30, 2012 at 11:51 PM, anil gupta <[EMAIL PROTECTED]> >> wrote: >> > That's pretty interesting!! From where you figured out that you need to >> add >> > that property? Just trying to understand how adding that property fixed >> the >> > issue. >> > >> > On Mon, Jul 30, 2012 at 8:12 PM, abhiTowson cal >> > <[EMAIL PROTECTED]>wrote: >> > >> >> Hi anil, >> >> >> >> Adding property resolved issue for me, and i also made this change >> >> >> >> vim hadoop-env.sh >> >> >> >> export JAVA_HOME=/usr/lib/java-1.6.0/jdk1.6.0_33 >> >> if [ "$JAVA_HOME" != "" ]; then >> >> #echo "run java in $JAVA_HOME" >> >> JAVA_HOME=$JAVA_HOME >> >> fi >> >> >> >> if [ "$JAVA_HOME" = "" ]; then >> >> echo "Error: JAVA_HOME is not set." >> >> exit 1 >> >> fi >> >> >> >> JAVA=$JAVA_HOME/bin/java >> >> JAVA_HEAP_MAX=-Xmx1000m >> >> >> >> Regards >> >> Abhishek >> >> >> >> >> >> On Mon, Jul 30, 2012 at 10:47 PM, anil gupta <[EMAIL PROTECTED]> >> >> wrote: >> >> > Hi Abhishek, >> >> > >> >> > Did you mean that adding >> yarn.resourcemanager.resource-tracker.address >> >> > along with yarn.log-aggregation-enable in my configuration will >> resolve >> >> the >> >> > problem in which map-reduce job fails at 0% with the following >> error: In >> >> > the web page of >> >> > >> >> >> http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe >> >> > page says: >> >> > Failed redirect for container_1343687008058_0003_01_000001 Failed >> while >> >> > trying to construct the redirect url to the log server. Log Server >> url >> >> may >> >> > not be configured. Unknown container. Container either has not >> started or >> >> > has already completed or doesn't belong to this node at all. >> >> > Please let me know. >> >> > >> >> > Thanks, >> >> > Anil Gupta >> >> > >> >> > On Mon, Jul 30, 2012 at 7:30 PM, abhiTowson cal >> >> > <[EMAIL PROTECTED]>wrote: >> >> > >> >> >> hi anil, >> >> >> >> >> >> Adding these help me resolve the issue for me >> >> >> yarn.resourcemanager.resource-tracker.address >> >> >> >> >> >> Regards >> >> >> Abhishek >> >> >> >> >> >> On Mon, Jul 30, 2012 at 7:56 PM, anil gupta <[EMAIL PROTECTED]> >> >> wrote: >> >> >> > Hi Rahul, >> >> >> > >> >> >> > Thanks for your response. I can certainly enable the >> >> >> > yarn.log-aggregation-enable to true. But after enabling this what >> >> manual >> >> >> > steps i will have to take to run jobs. Could you please elaborate. >> >> >> > >> >> >> > Thanks, >> >> >> > Anil >> >> >> > >> >> >> > On Mon, Jul 30, 2012 at 4:26 PM, Rahul Jain <[EMAIL PROTECTED]> >> wrote: >> >> >> > >> >> >> >> The inability to look at map-reduce logs for failed logs is due >> to >> >> >> number >> >> >> >> of open issues in yarn; see my recent comment here: >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://issues.apache.org/jira/browse/MAPREDUCE-4428?focusedCommentId=13412995&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13412995 >> >> >> >> >> >> >> >> You can workaround this by enabling log aggregation and manually >> >> copying Thanks & Regards, Anil Gupta |