I am not sure everything that may be causing this, especially because the stack trace is cut off. Your file lease has expired on the output file. Typically the client is supposed to keep the file lease up to date, so if RPC had a very long hiccup in it you may be getting this problem. It could also be somehow related to the OutputCommitter in another task deleting the file out from under the task.
From: David Parks <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, February 11, 2013 12:02 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: File does not exist on part-r-00000 file after reducer runs
Are there any rules against writing results to Reducer.Context while in the cleanup() method?
I’ve got a reducer that is downloading a few 10’s of millions of images from a set of URLs feed to it.
To be efficient I run many connections in parallel, but limit connections per domain and frequency of connections.
In order to do that efficiently I read in many URLs from the reduce method and queue them in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup() method where I block until all threads have finished processing.
We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes after Hadoop reports 100% input records delivered, then at the end, my code appears to exit normally and I get this exception immediately after:
2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing complete, shut down normally 1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.
I have suspicion that there are some subtle rules of Hadoop’s I’m violating here.