A point to mention from http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/:
If each task takes less than 30-40 seconds, reduce the number of tasks. The task setup and scheduling overhead is a few seconds, so if tasks finish very quickly, you’re wasting time while not doing work. JVM reuse can also be enabled to solve this problem.
Further I can think if we create a huge tree in the mapper phase in a Child JVM(lets say implementation needs a huge tree to be created), same can be re-used across the JVMs rather than creating again and again.
On Jun 4, 2012, at 2:12 PM, Arpit Wanchoo wrote:
> I wanted to check what exactly we gain when JVM reusability is enabled in mapped job.
> My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
> If yes then is there any way I can control it or stop from being called more than once.
> Arpit Wanchoo | Sr. Software Engineer
> Guavus Network Systems.
> 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
> Mobile Number +91-9899949788