Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Help with DFSClient Exception.


Copy link to this message
-
Re: Help with DFSClient Exception.
Whats the block size?
also are you experiencing any slowness in network?

i am guessing you are using EC2

these issues normally come with network problems

On Mon, May 28, 2012 at 3:57 PM, akshaymb <[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> We are frequently observing the exception
> java.io.IOException: DFSClient_attempt_201205232329_28133_r_000002_0 could
> not complete file
>
> /output/tmp/test/_temporary/_attempt_201205232329_28133_r_000002_0/part-r-00002.
> Giving up.
> on our cluster.  The exception occurs during writing a file.  We are using
> Hadoop 0.20.2. It’s ~250 nodes cluster and on average 1 box goes down every
> 3 days.
>
> Detailed stack trace :
> 12/05/27 23:26:54 INFO mapred.JobClient: Task Id :
> attempt_201205232329_28133_r_000002_0, Status : FAILED
> java.io.IOException: DFSClient_attempt_201205232329_28133_r_000002_0 could
> not complete file
>
> /output/tmp/test/_temporary/_attempt_201205232329_28133_r_000002_0/part-r-00002.
> Giving up.
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3331)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3240)
>        at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>        at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>        at
>
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:106)
>        at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Our investigation:
> We have min replication factor set to 2.  As mentioned
> http://kazman.shidler.hawaii.edu/ArchDocDecomposition.html here  , “A call
> to complete() will not return true until all the file's blocks have been
> replicated the minimum number of times.  Thus, DataNode failures may cause
> a
> client to call complete() several times before succeeding”, we should retry
> complete() several times.
> The org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal() calls
> complete() function and retries it for 20 times.  But in spite of that file
> blocks are not replicated minimum number of times. The retry count is not
> configurable.  Changing min replication factor to 1 is also not a good idea
> since there are continuously jobs running on our cluster.
>
> Do we have any solution / workaround for this problem?
>
> What is min replication factor in general used in industry.
>
> Let me know if any further inputs required.
>
> Thanks,
> -Akshay
>
>
>
> --
> View this message in context:
> http://old.nabble.com/Help-with-DFSClient-Exception.-tp33918949p33918949.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB