Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> INFO org.apache.hadoop.mapred.TaskTracker: attempt_XXXX NaN%


+
Aaron Zimmerman 2012-12-26, 15:14
+
Vinod Kumar Vavilapalli 2012-12-26, 19:21
Copy link to this message
-
Re: INFO org.apache.hadoop.mapred.TaskTracker: attempt_XXXX NaN%
What information can I get you?

I'm not sending periodic updates, I'm just using pig to generate the map
reduce jobs, so perhaps something I'm doing is causing the pig not to be
able to report?  

The map task itself is querying Hbase, essentially a full table scan, the
table currently has 9 regions, so 9 map tasks.

I don't see that line in the task logs, just in the task tracker as they
are performing the task.

It looks to me like the tasks that are coming back progress NaN eventually
fail and are retried, and the retry succeeds.

http://hastebin.com/xuhugipuve.sm - task fails

http://hastebin.com/vusadavano.vhdl - task succeeds
On 12/26/12 1:21 PM, "Vinod Kumar Vavilapalli" <[EMAIL PROTECTED]>
wrote:

>
>The NaN is very suspicious, perhaps a bug - will need more information
>
>But irrespective, are you sending periodic updates from your map/reduce
>code? The framework has the 10 minute timeout to avoid hung tasks, so the
>user code can report progress via the Reporter interface and avoid the
>task-failures.
>
>HTH,
>
>+Vinod Kumar Vavilapalli
>Hortonworks Inc.
>http://hortonworks.com/
>
>On Dec 26, 2012, at 7:14 AM, Aaron Zimmerman wrote:
>
>> Hi,
>>  I'm new to hadoop, setting up a new cluster on hadoop 1.0.3 that
>>currently
>> only has 2 datanode/tasktrackers.  I'll be adding more soon, but I'm
>>worried
>> about something being configured incorrectly. When I run a moderately
>> expensive map reduce job (via pig), the job usually fails (though it
>>does
>> succeed 1/8 times or so).
>>
>> ERROR 2997: Unable to recreate exception from backed error: Task
>> attempt_201212171952_0406_m_000020_3 failed to report status for 601
>> seconds. Killing!
>>
>> Any time a job runs on the cluster, both task tracker logs output line
>>after
>> line of
>> INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201212171952_0411_m_000000_0 NaN%, with different attempt
>> identifiers.  
>>
>> Interspersed with these entries are lines like,
>> org.apache.hadoop.mapred.TaskTracker:
>>attempt_201212171952_0411_r_000000_0
>> 0.1851852% reduce > copy (5 of 9 at 0.00 MB/s) >
>>
>> Which makes it look to me like some of the tasks are working, but some
>>of
>> the tasks just stall out, and perhaps they eventually timeout the entire
>> job?
>>
>> So maybe my job is just to labor intensive for the cluster, but the task
>> tracker log entry seems odd, like something is wrong.  Why would it say
>> NaN%?  I know that I can extend the timeout allotment, but I'd rather
>>not do
>> that as a permanent solution.  Is there any other config that I could
>> update?  Has anyone seen that task tracker line before?  I can't find
>> anything about it via Google, etc.
>>
>> Thanks,
>>
>> Aaron Zimmerman
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB