Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> INFO org.apache.hadoop.mapred.TaskTracker: attempt_XXXX NaN%


Copy link to this message
-
INFO org.apache.hadoop.mapred.TaskTracker: attempt_XXXX NaN%
Hi,
  I'm new to hadoop, setting up a new cluster on hadoop 1.0.3 that currently
only has 2 datanode/tasktrackers.  I'll be adding more soon, but I'm worried
about something being configured incorrectly. When I run a moderately
expensive map reduce job (via pig), the job usually fails (though it does
succeed 1/8 times or so).

ERROR 2997: Unable to recreate exception from backed error: Task
attempt_201212171952_0406_m_000020_3 failed to report status for 601
seconds. Killing!

Any time a job runs on the cluster, both task tracker logs output line after
line of
INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201212171952_0411_m_000000_0 NaN%, with different attempt
identifiers.  

Interspersed with these entries are lines like,
org.apache.hadoop.mapred.TaskTracker: attempt_201212171952_0411_r_000000_0
0.1851852% reduce > copy (5 of 9 at 0.00 MB/s) >

Which makes it look to me like some of the tasks are working, but some of
the tasks just stall out, and perhaps they eventually timeout the entire
job?

So maybe my job is just to labor intensive for the cluster, but the task
tracker log entry seems odd, like something is wrong.  Why would it say
NaN%?  I know that I can extend the timeout allotment, but I'd rather not do
that as a permanent solution.  Is there any other config that I could
update?  Has anyone seen that task tracker line before?  I can't find
anything about it via Google, etc.

Thanks,

Aaron Zimmerman