-Re: Map/Reduce Tasks Fails
Raj Vishwanathan 2012-05-22, 14:50
> From: Harsh J <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Sent: Tuesday, May 22, 2012 7:13 AM
>Subject: Re: Map/Reduce Tasks Fails
>Is the same DN 10.0.25.149 reported across all failures? And do you
>notice any machine patterns when observing the failed tasks (i.e. are
>they clumped on any one or a few particular TTs repeatedly)?
>On Tue, May 22, 2012 at 7:32 PM, Sandeep Reddy P
><[EMAIL PROTECTED]> wrote:
>> We have a 5node cdh3u4 cluster running. When i try to do teragen/terasort
>> some of the map tasks are Failed/Killed and the logs show similar error on
>> all machines.
>> 2012-05-22 09:43:50,831 INFO org.apache.hadoop.hdfs.DFSClient:
>> Exception in createBlockOutputStream 10.0.25.149:50010
>> java.net.SocketTimeoutException: 69000 millis timeout while waiting
>> for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.0.25.149:55835
>> 2012-05-22 09:44:25,968 INFO org.apache.hadoop.hdfs.DFSClient:
>> Abandoning block blk_7260720956806950576_1825
>> 2012-05-22 09:44:25,973 INFO org.apache.hadoop.hdfs.DFSClient:
>> Excluding datanode 10.0.25.149:50010
>> 2012-05-22 09:46:36,350 WARN org.apache.hadoop.mapred.Task: Parent
>> died. Exiting attempt_201205211504_0007_m_000016_1.
>> Are these kind of errors common?? Atleast 1 map task is failing due to
>> above reason on all the machines.We are using 24 mappers for teragen.
>> For us it took 3hrs 44min 17 sec to generate 50Gb data with 24 mappers
>> and 17failed/8 killed task attempts.
>> 24min 10 sec for 5GB data with 24 mappers and 9 killed Task attempts.
>> Cluster works good for small datasets.