Are there any rules against writing results to Reducer.Context while in the
cleanup() method?


I’ve got a reducer that is downloading a few 10’s of millions of images from
a set of URLs feed to it.


To be efficient I run many connections in parallel, but limit connections
per domain and frequency of connections.


In order to do that efficiently I read in many URLs from the reduce method
and queue them in a processing queue, so at some point we read in all the
data and Hadoop calls the cleanup()  method where I block until all threads
have finished processing.


We may continue processing and writing results (in a synchronized manner)
for 20 or 30 minutes after Hadoop reports 100% input records delivered, then
at the end, my code appears to exit normally and I get this exception
immediately after:


2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor
Main Loop): Processing complete, shut down normally

2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater
(main): Initializing logsʼ truncater with mapRetainSize=-1 and

2013-02-11 05:15:23,685 INFO (main):
Initialized cache for UID to User mapping with a cache timeout of 14400

2013-02-11 05:15:23,685 INFO (main):
Got UserName hadoop for UID 106 from the native implementation

2013-02-11 05:15:23,687 ERROR (main):
PriviledgedActionException as:hadoop
cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop

.hdfs.server.namenode.LeaseExpiredException: No lease on
art-r-00002 File does not exist. Holder
DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files.



I have suspicion that there are some subtle rules of Hadoop’s I’m violating

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB