On May 22, 2013, at 3:35pm, Jason Weiss wrote:
OK, thanks. Sounds like you were pegged on CPU usage.
But that does surprise me a bit. Did you check that you were using all cores?
PS - back in 2006 I spent a week of hell debugging an occasion job failure on Hadoop (this is when it was still part of Nutch). Turns out one of our 12 slaves was accidentally using OpenJDK, and this had a JIT compiler bug that would occasionally rear its ugly head. Obviously the Sun/Oracle JRE isn't bug-free, but it gets a lot more stress testing. So one of my basic guidelines in the ops portion of the Hadoop class I teach is that every server must have exactly the same version of Oracle's JRE.
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr