Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Lost tasktracker errors


Copy link to this message
-
Lost tasktracker errors
I'm running a job over a 380 billion row 20 TB dataset which is computing
sum(), max() etc. The job is running fine at around 3 million rows per
second for several hours then grinding to a halt as it loses one after
another of the tasktrackers.  We see a healthy mix of successful map and
reduce attempts on the tasktracker...

 

2013-01-03 23:41:40,249 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041109_0 1.0%

2013-01-03 23:41:40,256 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041105_0 1.0%

2013-01-03 23:41:40,260 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041105_0 1.0%

2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201301031813_0001_m_041105_0 is done.

2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201301031813_0001_m_041105_0  was 111

2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 8

2013-01-03 23:41:40,374 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041106_0 0.9884119%

2013-01-03 23:41:40,432 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201301031813_0001_m_2021872807 exited with exit code 0. Number of tasks
it ran: 1

2013-01-03 23:41:40,807 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041103_0 0.9884134%

2013-01-03 23:41:43,190 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041101_0 1.0%

2013-01-03 23:41:43,193 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041101_0 1.0%

2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201301031813_0001_m_041101_0 is done.

2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201301031813_0001_m_041101_0  was 111

2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 9

2013-01-03 23:41:43,303 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041109_0 1.0%

2013-01-03 23:41:43,306 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041109_0 1.0%

2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201301031813_0001_m_041109_0 is done.

2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201301031813_0001_m_041109_0  was 111

2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 10

2013-01-03 23:41:43,361 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201301031813_0001_m_36690963 exited with exit code 0. Number of tasks it
ran: 1

2013-01-03 23:41:43,428 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041106_0 1.0%

2013-01-03 23:41:43,432 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041106_0 1.0%

2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201301031813_0001_m_041106_0 is done.

2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201301031813_0001_m_041106_0  was 111

2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 11

2013-01-03 23:41:43,457 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201301031813_0001_m_-2095784622 exited with exit code 0. Number of tasks
it ran: 1

2013-01-03 23:41:43,595 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201301031813_0001_m_1190449426 exited with exit code 0. Number of tasks
it ran: 1

2013-01-03 23:41:43,862 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041103_0 1.0%

2013-01-03 23:41:43,866 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_m_041103_0 1.0%

2013-01-03 23:41:43,867 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201301031813_0001_m_041103_0 is done.

2013-01-03 23:41:43,867 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201301031813_0001_m_041103_0  was 111

2013-01-03 23:41:43,867 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 12

2013-01-03 23:41:44,021 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201301031813_0001_m_-505301168 exited with exit code 0. Number of tasks
it ran: 1

2013-01-03 23:41:45,539 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050235882% reduce > copy (40975 of
271884 at 0.00 MB/s) >

 

 

Then it seems to get stuck on reduce attempts, before exiting with a
SIG_TERM (143) exit code...

 

2013-01-03 23:50:49,642 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:50:55,678 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:01,717 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:04,755 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:10,796 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:16,831 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:19,870 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:25,911 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy (41010 of
271884 at 0.00 MB/s) >

2013-01-03 23:51:31,953
+
Robert Evans 2013-01-04, 14:34
+
Royston Sellman 2013-01-04, 15:02
+
Robert Evans 2013-01-04, 15:16
+
Royston Sellman 2013-01-04, 18:04
+
Jeff Bean 2013-01-07, 21:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB