I have posted the same question to the dev list before but I haven'
heard back from anyone so I figured someone here might be able to shed
some light on it. I have recently been reading Hadoop 1.1.0 source
code to better understand the internals and learned a lot from it, so
far. When I was looking at ReduceTask.java, I saw some synchronization
attempts, some of which seem redundant to me. To be more specific, in
MapOutputCopier's copyOutput method, before calling
addToMapOutputFilesOnDisk, we synchronize on mapOutputFilesOnDisk but
the actual addToMapOutputFilesOnDisk also synchronizes on it again.
And this entire block in copyOutput is already synchronized on
ReduceTask.this which seems a bit confusing to me. I'm not sure if
this is the right place to ask this question, but I'd appreciate if
someone point out why we need this sort of triple locking here.