I think we should revert YARN-1490 from Hadoop 2.3 branch. I think it was
causing some strange behavior in the Oozie unit tests:
Basically, we use a single MiniMRCluster and MiniDFSCluster across all unit
tests in a module. With YARN-1490 we saw that, regardless of test order,
the last few tests would timeout waiting for an MR job to finish; on slower
machines, the entire test suite would timeout. Through some digging, I
found that we were getting a ton of "Connection refused" Exceptions on
LeaseRenewer talking to the NN and a few on the AM talking to the RM.
After a bunch of investigation, I found that the problem went away once
YARN-1490 was removed. Though I couldn't figure out the exact problem.
Even though this occurred in unit tests, it does make me concerned that it
could indicate some bigger issue in a long-running real cluster (where
everything isn't running on the same machine) that we haven't seen yet.
On Thu, Feb 6, 2014 at 3:06 PM, Karthik Kambatla <[EMAIL PROTECTED]> wrote: