Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MRv2 jobs fail when run with more than one slave


Copy link to this message
-
Re: MRv2 jobs fail when run with more than one slave
Forwarding your email to the cdh-user group.

Thanks
Karthik

On Tue, Jul 17, 2012 at 2:24 PM, Trevor <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some
> strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with
> more than one slave. For every slave except the one running the Application
> Master, I get the following failed tasks and warnings repeatedly:
>
> 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001
> 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running
> in uber mode : false
> 12/07/13 14:22:17 INFO mapreduce.Job:  map 0% reduce 0%
> 12/07/13 14:22:46 INFO mapreduce.Job:  map 1% reduce 0%
> 12/07/13 14:22:52 INFO mapreduce.Job:  map 2% reduce 0%
> 12/07/13 14:22:55 INFO mapreduce.Job:  map 3% reduce 0%
> 12/07/13 14:22:58 INFO mapreduce.Job:  map 4% reduce 0%
> 12/07/13 14:23:04 INFO mapreduce.Job:  map 5% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job:  map 6% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job: Task Id :
> attempt_1342207265272_0001_m_000004_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
> returned HTTP response code: 400 for URL: http://
>
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stdout
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
> returned HTTP response code: 400 for URL: http://
>
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stderr
> 12/07/13 14:23:08 INFO mapreduce.Job: Task Id :
> attempt_1342207265272_0001_m_000003_0, Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
> returned HTTP response code: 400 for URL: http://
>
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000003_0&filter=stdout
> ...
> 12/07/13 14:25:12 INFO mapreduce.Job:  map 25% reduce 0%
> 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed
> with state FAILED due to:
> ...
>                 Failed map tasks=19
>                 Launched map tasks=31
>
> The HTTP 400 error appears to be generated by the ShuffleHandler, which is
> configured to run on port 8080 of the slaves, and doesn't understand that
> URL. What I've been able to piece together so far is that /tasklog is
> handled by the TaskLogServlet, which is part of the TaskTracker. However,
> isn't this an MRv1 class that shouldn't even be running in my
> configuration? Also, the TaskTracker appears to run on port 50060, so I
> don't know where port 8080 is coming from.
>
> Though it could be a red herring, this warning seems to be related to the
> job failing, despite the fact that the job makes progress on the slave
> running the AM. The Node Manager logs on both AM and non-AM slaves appear
> fairly similar, and I don't see any errors in the non-AM logs.
>
> Another strange data point: These failures occur running the slaves on ARM
> systems. Running the slaves on x86 with the same configuration works. I'm
> using the same tarball on both, which means that the native-hadoop library
> isn't loaded on ARM. The master/client is the same x86 system in both
> scenarios. All nodes are running Ubuntu 12.04.
>
> Thanks for any guidance,
> Trevor
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB