Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> Problem while submitting jobs to NM started with ephemeral ports.

Prashant Sharma 2011-10-17, 08:33
Copy link to this message
Re: Problem while submitting jobs to NM started with ephemeral ports.
also I tried commenting out two last two properties in yarn-site
mentioned above. And keeping the following property in mapred-site

      <name> mapreduce.shuffle.port</name>

I got this exception while running a wordcount.

 mapreduce.Job (Job.java:printTaskEvents(1315)) - Task Id :
attempt_1318840789401_0005_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
shuffle in fetcher#5
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:126)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:365)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:227)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
And everything works out of the box otherwise.


On Mon, Oct 17, 2011 at 2:03 PM, Prashant Sharma
> I am using following properties in yarn-site
> <property>
> <name>yarn.nodemanager.aux-services</name>
> <value>mapreduce.shuffle</value>
> </property>
>  <property>
> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
> </property>
>  <property>
>    <name>yarn.nodemanager.address</name>
>    <value>localhost:0</value>
>  </property>
>  <property>
>    <name>yarn.nodemanager.localizer.address</name>
>    <value>localhost:0</value>
>  </property>
> Everything runs fine. (means all daemons are started perfectly) But
> when you try to submit the job. Job is stuck and NM logs says trying
> to connect to 'localhost:0'. Localization takes forever. Why?
> Please see the NM logs below.
> http://pastebin.com/QfQDZeqF
> Thanks,
> Prashant