Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.


+
bmdevelopment 2010-06-24, 19:29
Copy link to this message
-
Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Hi,

> I've been getting the following error when trying to run a very simple
> MapReduce job.
> Map finishes without problem, but error occurs as soon as it enters
> Reduce phase.
>
> 10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
> attempt_201006241812_0001_r_000000_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>
> I am running a 5 node cluster and I believe I have all my settings correct:
>
> * ulimit -n 32768
> * DNS/RDNS configured properly
> * hdfs-site.xml : http://pastebin.com/xuZ17bPM
> * mapred-site.xml : http://pastebin.com/JraVQZcW
>
> The program is very simple - just counts a unique string in a log file.
> See here: http://pastebin.com/5uRG3SFL
>
> When I run, the job fails and I get the following output.
> http://pastebin.com/AhW6StEb
>
> However, runs fine when I do *not* use substring() on the value (see
> map function in code above).
>
> This runs fine and completes successfully:
>            String str = val.toString();
>
> This causes error and fails:
>            String str = val.toString().substring(0,10);
>
> Please let me know if you need any further information.
> It would be greatly appreciated if anyone could shed some light on this problem.

It catches attention that changing the code to use a substring is
causing a difference. Assuming it is consistent and not a red herring,
can you look at the counters for the two jobs using the JobTracker web
UI - things like map records, bytes etc and see if there is a
noticeable difference ? Also, are the two programs being run against
the exact same input data ?

Also, since the cluster size is small, you could also look at the
tasktracker logs on the machines where the maps have run to see if
there are any failures when the reduce attempts start failing.

Thanks
Hemanth
+
bmdevelopment 2010-06-25, 14:56
+
bmdevelopment 2010-07-05, 07:11
+
Hemanth Yamijala 2010-07-06, 08:34
+
bmdevelopment 2010-07-07, 08:02
+
bmdevelopment 2010-07-08, 06:49
+
Ted Yu 2010-07-08, 17:38
+
Todd Lipcon 2010-07-08, 18:22
+
bmdevelopment 2010-07-09, 03:26
+
Ted Yu 2010-07-09, 04:54
+
bmdevelopment 2010-07-09, 09:07
+
Ted Yu 2010-07-09, 13:18