Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> INFO org.apache.hadoop.mapred.TaskTracker: attempt_XXXX NaN%


Copy link to this message
-
Re: INFO org.apache.hadoop.mapred.TaskTracker: attempt_XXXX NaN%

The NaN is very suspicious, perhaps a bug - will need more information

But irrespective, are you sending periodic updates from your map/reduce code? The framework has the 10 minute timeout to avoid hung tasks, so the user code can report progress via the Reporter interface and avoid the task-failures.

HTH,

+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Dec 26, 2012, at 7:14 AM, Aaron Zimmerman wrote:

> Hi,
>  I'm new to hadoop, setting up a new cluster on hadoop 1.0.3 that currently
> only has 2 datanode/tasktrackers.  I'll be adding more soon, but I'm worried
> about something being configured incorrectly. When I run a moderately
> expensive map reduce job (via pig), the job usually fails (though it does
> succeed 1/8 times or so).
>
> ERROR 2997: Unable to recreate exception from backed error: Task
> attempt_201212171952_0406_m_000020_3 failed to report status for 601
> seconds. Killing!
>
> Any time a job runs on the cluster, both task tracker logs output line after
> line of
> INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201212171952_0411_m_000000_0 NaN%, with different attempt
> identifiers.  
>
> Interspersed with these entries are lines like,
> org.apache.hadoop.mapred.TaskTracker: attempt_201212171952_0411_r_000000_0
> 0.1851852% reduce > copy (5 of 9 at 0.00 MB/s) >
>
> Which makes it look to me like some of the tasks are working, but some of
> the tasks just stall out, and perhaps they eventually timeout the entire
> job?
>
> So maybe my job is just to labor intensive for the cluster, but the task
> tracker log entry seems odd, like something is wrong.  Why would it say
> NaN%?  I know that I can extend the timeout allotment, but I'd rather not do
> that as a permanent solution.  Is there any other config that I could
> update?  Has anyone seen that task tracker line before?  I can't find
> anything about it via Google, etc.
>
> Thanks,
>
> Aaron Zimmerman
>
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB