Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MRv2 jobs fail when run with more than one slave


Copy link to this message
-
Re: MRv2 jobs fail when run with more than one slave
Look at the NodeManager logs on perfgb0n0 and look for logs of container_1342570404456_0001_* and check for errors.

Arun

On Jul 17, 2012, at 5:33 PM, Trevor wrote:

> Actually, the HTTP 400 is a red herring, and not the core issue. I added "-D mapreduce.client.output.filter=ALL" to the command line, and fetching the task output fails even for successful tasks:
>
> 12/07/17 19:15:55 INFO mapreduce.Job: Task Id : attempt_1342570404456_0001_m_000006_1, Status : SUCCEEDED
> 12/07/17 19:15:55 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342570404456_0001_m_000006_1&filter=stdout
>
> Having a better idea what to search for, I found that it's a recently fixed bug: https://issues.apache.org/jira/browse/MAPREDUCE-3889
>
> So the real question is how can I debug the failing tasks on the non-AM slave(s)? Although I see failure on the client:
>
> 12/07/17 19:14:35 INFO mapreduce.Job: Task Id : attempt_1342570404456_0001_m_000002_0, Status : FAILED
>
> I see what appears to be success on the slave:
>
> 2012-07-17 19:13:47,476 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1342570404456_0001_01_000002 succeeded
> 2012-07-17 19:13:47,477 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1342570404456_0001_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS
>
> Suggestions of where to look next?
>
> Thanks,
> Trevor
>
> On Tue, Jul 17, 2012 at 6:33 PM, Trevor <[EMAIL PROTECTED]> wrote:
> Arun, I just verified that I get the same error with 2.0.0-alpha (official tarball) and 2.0.1-alpha (built from svn).
>
> Karthik, thanks for forwarding.
>
> Thanks,
> Trevor
>
>
> On Tue, Jul 17, 2012 at 6:18 PM, Karthik Kambatla <[EMAIL PROTECTED]> wrote:
> Forwarding your email to the cdh-user group.
>
> Thanks
> Karthik
>
>
> On Tue, Jul 17, 2012 at 2:24 PM, Trevor <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with more than one slave. For every slave except the one running the Application Master, I get the following failed tasks and warnings repeatedly:
>
> 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001
> 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running in uber mode : false
> 12/07/13 14:22:17 INFO mapreduce.Job:  map 0% reduce 0%
> 12/07/13 14:22:46 INFO mapreduce.Job:  map 1% reduce 0%
> 12/07/13 14:22:52 INFO mapreduce.Job:  map 2% reduce 0%
> 12/07/13 14:22:55 INFO mapreduce.Job:  map 3% reduce 0%
> 12/07/13 14:22:58 INFO mapreduce.Job:  map 4% reduce 0%
> 12/07/13 14:23:04 INFO mapreduce.Job:  map 5% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job:  map 6% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job: Task Id : attempt_1342207265272_0001_m_000004_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stdout
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stderr
> 12/07/13 14:23:08 INFO mapreduce.Job: Task Id : attempt_1342207265272_0001_m_000003_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000003_0&filter=stdout
> ...
> 12/07/13 14:25:12 INFO mapreduce.Job:  map 25% reduce 0%
> 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed with state FAILED due to:
> ...
>                 Failed map tasks=19

Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB