Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Lost tasktracker errors


+
Royston Sellman 2013-01-04, 11:52
Copy link to this message
-
Re: Lost tasktracker errors
Is there anything in the task tracker's logs?  Did the machines go down?
Are there full disks on those nodes?

--Bobby

On 1/4/13 5:52 AM, "Royston Sellman" <[EMAIL PROTECTED]>
wrote:

>I'm running a job over a 380 billion row 20 TB dataset which is computing
>sum(), max() etc. The job is running fine at around 3 million rows per
>second for several hours then grinding to a halt as it loses one after
>another of the tasktrackers.  We see a healthy mix of successful map and
>reduce attempts on the tasktracker...
>
>
>
>2013-01-03 23:41:40,249 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041109_0 1.0%
>
>2013-01-03 23:41:40,256 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041105_0 1.0%
>
>2013-01-03 23:41:40,260 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041105_0 1.0%
>
>2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker: Task
>attempt_201301031813_0001_m_041105_0 is done.
>
>2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker:
>reported
>output size for attempt_201301031813_0001_m_041105_0  was 111
>
>2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker:
>addFreeSlot : current free slots : 8
>
>2013-01-03 23:41:40,374 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041106_0 0.9884119%
>
>2013-01-03 23:41:40,432 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>jvm_201301031813_0001_m_2021872807 exited with exit code 0. Number of
>tasks
>it ran: 1
>
>2013-01-03 23:41:40,807 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041103_0 0.9884134%
>
>2013-01-03 23:41:43,190 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041101_0 1.0%
>
>2013-01-03 23:41:43,193 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041101_0 1.0%
>
>2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker: Task
>attempt_201301031813_0001_m_041101_0 is done.
>
>2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker:
>reported
>output size for attempt_201301031813_0001_m_041101_0  was 111
>
>2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker:
>addFreeSlot : current free slots : 9
>
>2013-01-03 23:41:43,303 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041109_0 1.0%
>
>2013-01-03 23:41:43,306 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041109_0 1.0%
>
>2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker: Task
>attempt_201301031813_0001_m_041109_0 is done.
>
>2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker:
>reported
>output size for attempt_201301031813_0001_m_041109_0  was 111
>
>2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker:
>addFreeSlot : current free slots : 10
>
>2013-01-03 23:41:43,361 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>jvm_201301031813_0001_m_36690963 exited with exit code 0. Number of tasks
>it
>ran: 1
>
>2013-01-03 23:41:43,428 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041106_0 1.0%
>
>2013-01-03 23:41:43,432 INFO org.apache.hadoop.mapred.TaskTracker:
>attempt_201301031813_0001_m_041106_0 1.0%
>
>2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker: Task
>attempt_201301031813_0001_m_041106_0 is done.
>
>2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker:
>reported
>output size for attempt_201301031813_0001_m_041106_0  was 111
>
>2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker:
>addFreeSlot : current free slots : 11
>
>2013-01-03 23:41:43,457 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>jvm_201301031813_0001_m_-2095784622 exited with exit code 0. Number of
>tasks
>it ran: 1
>
>2013-01-03 23:41:43,595 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>jvm_201301031813_0001_m_1190449426 exited with exit code 0. Number of
>tasks
>it ran: 1
>
>2013-01-03 23:41:43,862 INFO org.apache.hadoop.mapred.TaskTracker:
+
Royston Sellman 2013-01-04, 15:02
+
Robert Evans 2013-01-04, 15:16
+
Royston Sellman 2013-01-04, 18:04
+
Jeff Bean 2013-01-07, 21:03