Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MRv2 jobs fail when run with more than one slave


+
Trevor 2012-07-17, 21:24
+
Karthik Kambatla 2012-07-17, 23:18
+
Trevor 2012-07-17, 23:33
+
Trevor 2012-07-18, 00:33
Copy link to this message
-
Re: MRv2 jobs fail when run with more than one slave
Look at the NodeManager logs on perfgb0n0 and look for logs of container_1342570404456_0001_* and check for errors.

Arun

On Jul 17, 2012, at 5:33 PM, Trevor wrote:

> Actually, the HTTP 400 is a red herring, and not the core issue. I added "-D mapreduce.client.output.filter=ALL" to the command line, and fetching the task output fails even for successful tasks:
>
> 12/07/17 19:15:55 INFO mapreduce.Job: Task Id : attempt_1342570404456_0001_m_000006_1, Status : SUCCEEDED
> 12/07/17 19:15:55 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342570404456_0001_m_000006_1&filter=stdout
>
> Having a better idea what to search for, I found that it's a recently fixed bug: https://issues.apache.org/jira/browse/MAPREDUCE-3889
>
> So the real question is how can I debug the failing tasks on the non-AM slave(s)? Although I see failure on the client:
>
> 12/07/17 19:14:35 INFO mapreduce.Job: Task Id : attempt_1342570404456_0001_m_000002_0, Status : FAILED
>
> I see what appears to be success on the slave:
>
> 2012-07-17 19:13:47,476 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1342570404456_0001_01_000002 succeeded
> 2012-07-17 19:13:47,477 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1342570404456_0001_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS
>
> Suggestions of where to look next?
>
> Thanks,
> Trevor
>
> On Tue, Jul 17, 2012 at 6:33 PM, Trevor <[EMAIL PROTECTED]> wrote:
> Arun, I just verified that I get the same error with 2.0.0-alpha (official tarball) and 2.0.1-alpha (built from svn).
>
> Karthik, thanks for forwarding.
>
> Thanks,
> Trevor
>
>
> On Tue, Jul 17, 2012 at 6:18 PM, Karthik Kambatla <[EMAIL PROTECTED]> wrote:
> Forwarding your email to the cdh-user group.
>
> Thanks
> Karthik
>
>
> On Tue, Jul 17, 2012 at 2:24 PM, Trevor <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with more than one slave. For every slave except the one running the Application Master, I get the following failed tasks and warnings repeatedly:
>
> 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001
> 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running in uber mode : false
> 12/07/13 14:22:17 INFO mapreduce.Job:  map 0% reduce 0%
> 12/07/13 14:22:46 INFO mapreduce.Job:  map 1% reduce 0%
> 12/07/13 14:22:52 INFO mapreduce.Job:  map 2% reduce 0%
> 12/07/13 14:22:55 INFO mapreduce.Job:  map 3% reduce 0%
> 12/07/13 14:22:58 INFO mapreduce.Job:  map 4% reduce 0%
> 12/07/13 14:23:04 INFO mapreduce.Job:  map 5% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job:  map 6% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job: Task Id : attempt_1342207265272_0001_m_000004_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stdout
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stderr
> 12/07/13 14:23:08 INFO mapreduce.Job: Task Id : attempt_1342207265272_0001_m_000003_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000003_0&filter=stdout
> ...
> 12/07/13 14:25:12 INFO mapreduce.Job:  map 25% reduce 0%
> 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed with state FAILED due to:
> ...
>                 Failed map tasks=19

Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
+
Arun C Murthy 2012-07-17, 23:04