Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - MRv2 jobs fail when run with more than one slave


+
Trevor 2012-07-17, 21:24
Copy link to this message
-
Re: MRv2 jobs fail when run with more than one slave
Karthik Kambatla 2012-07-17, 23:18
Forwarding your email to the cdh-user group.

Thanks
Karthik

On Tue, Jul 17, 2012 at 2:24 PM, Trevor <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some
> strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with
> more than one slave. For every slave except the one running the Application
> Master, I get the following failed tasks and warnings repeatedly:
>
> 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001
> 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running
> in uber mode : false
> 12/07/13 14:22:17 INFO mapreduce.Job:  map 0% reduce 0%
> 12/07/13 14:22:46 INFO mapreduce.Job:  map 1% reduce 0%
> 12/07/13 14:22:52 INFO mapreduce.Job:  map 2% reduce 0%
> 12/07/13 14:22:55 INFO mapreduce.Job:  map 3% reduce 0%
> 12/07/13 14:22:58 INFO mapreduce.Job:  map 4% reduce 0%
> 12/07/13 14:23:04 INFO mapreduce.Job:  map 5% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job:  map 6% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job: Task Id :
> attempt_1342207265272_0001_m_000004_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
> returned HTTP response code: 400 for URL: http://
>
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stdout
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
> returned HTTP response code: 400 for URL: http://
>
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stderr
> 12/07/13 14:23:08 INFO mapreduce.Job: Task Id :
> attempt_1342207265272_0001_m_000003_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
> returned HTTP response code: 400 for URL: http://
>
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000003_0&filter=stdout
> ...
> 12/07/13 14:25:12 INFO mapreduce.Job:  map 25% reduce 0%
> 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed
> with state FAILED due to:
> ...
>                 Failed map tasks=19
>                 Launched map tasks=31
>
> The HTTP 400 error appears to be generated by the ShuffleHandler, which is
> configured to run on port 8080 of the slaves, and doesn't understand that
> URL. What I've been able to piece together so far is that /tasklog is
> handled by the TaskLogServlet, which is part of the TaskTracker. However,
> isn't this an MRv1 class that shouldn't even be running in my
> configuration? Also, the TaskTracker appears to run on port 50060, so I
> don't know where port 8080 is coming from.
>
> Though it could be a red herring, this warning seems to be related to the
> job failing, despite the fact that the job makes progress on the slave
> running the AM. The Node Manager logs on both AM and non-AM slaves appear
> fairly similar, and I don't see any errors in the non-AM logs.
>
> Another strange data point: These failures occur running the slaves on ARM
> systems. Running the slaves on x86 with the same configuration works. I'm
> using the same tarball on both, which means that the native-hadoop library
> isn't loaded on ARM. The master/client is the same x86 system in both
> scenarios. All nodes are running Ubuntu 12.04.
>
> Thanks for any guidance,
> Trevor
>
>
+
Trevor 2012-07-17, 23:33
+
Trevor 2012-07-18, 00:33
+
Arun C Murthy 2012-07-18, 01:25
+
Arun C Murthy 2012-07-17, 23:04