Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> Reduces Failing with 'Child Error'


Copy link to this message
-
Reduces Failing with 'Child Error'
Hi guys
I keep getting my reduces to Fail and I can't get a clue of what is going
on and it's quite frustrating.
Could you help me? any ideas? I'm sending some info, let me know if you
need more.

Regards
Tomas

I'm running a cluster of 10 slaves, with EC2 m1.xlarge's and a attached
volume of EBS of 80Gb, for data, plus the ephemeral nodes for local mapred

*Instance Family**Instance Type**Processor Arch**vCPU**ECU**Memory
(GiB)**Instance
Storage (GB)**EBS-optimized Available**Network Performance*

General purposem1.xlarge64-bit48154 x 420YesHigh
Job
Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attemptsmap100.00%245002212441 / 77reduce100.00%6800333555 / 43
attempt_201307181643_0007_r_000000_0task_201307181643_0007_r_000000
FAILED

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

task_201307181643_0007_r_000001
FAILED

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
attempt_201307181643_0007_r_000001_1task_201307181643_0007_r_000001
FAILED

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
attempt_201307181643_0007_r_000002_0task_201307181643_0007_r_000002
FAILED

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
....

The tasktracker log is as follows:

hadoop@cluster-slaves-00:/opt/hadoop/hadoop-1.0.3/logs$ grep
"attempt_201307181643_0007_r_000063_0"
hadoop-hadoop-tasktracker-cluster-slaves-00.log
2013-07-18 17:10:57,401 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201307181643_0007_r_000063_0
task's state:UNASSIGNED
2013-07-18 17:10:57,401 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_201307181643_0007_r_000063_0 which needs 1 slots
2013-07-18 17:10:57,402 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 1 and trying to launch
attempt_201307181643_0007_r_000063_0 which needs 1 slots
2013-07-18 17:10:57,549 INFO org.apache.hadoop.mapred.JvmManager: No new
JVM spawned for jobId/taskid:
job_201307181643_0007/attempt_201307181643_0007_r_000063_0. Attempting to
reuse: jvm_201307181643_0007_r_-901518427
2013-07-18 17:10:57,748 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
ID: jvm_201307181643_0007_r_-901518427 given task:
attempt_201307181643_0007_r_000063_0
2013-07-18 17:11:04,658 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.0% reduce > copy >
2013-07-18 17:11:11,139 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.02312925% reduce > copy (17 of 245
at 7.61 MB/s) >
2013-07-18 17:11:14,233 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.04217687% reduce > copy (31 of 245
at 10.66 MB/s) >
2013-07-18 17:11:17,274 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.0585034% reduce > copy (43 of 245 at
12.29 MB/s) >
2013-07-18 17:11:20,550 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.07755102% reduce > copy (57 of 245
at 14.04 MB/s) >
2013-07-18 17:11:24,342 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.10748299% reduce > copy (79 of 245
at 16.99 MB/s) >
2013-07-18 17:11:27,417 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.13469388% reduce > copy (99 of 245
at 18.17 MB/s) >
2013-07-18 17:11:30,502 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.14829932% reduce > copy (109 of 245
at 18.03 MB/s) >
2013-07-18 17:11:33,605 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.17687075% reduce > copy (130 of 245
at 19.56 MB/s) >
2013-07-18 17:11:37,243 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.20408162% reduce > copy (150 of 245
at 20.46 MB/s) >
2013-07-18 17:11:40,321 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.24081632% reduce > copy (177 of 245
at 21.61 MB/s) >
2013-07-18 17:11:43,395 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.27074832% reduce > copy (199 of 245
at 22.40 MB/s) >
2013-07-18 17:11:46,497 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.28163266% reduce > copy (207 of 245
at 21.74 MB/s) >
2013-07-18 17:11:49,570 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.28435373% reduce > copy (209 of 245
at 21.07 MB/s) >
2013-07-18 17:11:52,634 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.28571427% reduce > copy (210 of 245
at 19.60 MB/s) >
2013-07-18 17:11:55,696 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.28843537% reduce > copy (212 of 245
at 18.74 MB/s) >
2013-07-18 17:11:58,773 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.29523808% reduce > copy (217 of 245
at 18.52 MB/s) >
2013-07-18 17:12:01,863 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.29523808% reduce > copy (217 of 245
at 18.52 MB/s) >
2013-07-18 17:12:04,928 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201307181643_0007_r_000063_0 0.29795918% reduce > copy (219 of 245
at 16.70 MB/s) >
2013-07-18 17:1