Ok, Thanks Bejoy.
Only in some typical scenarios it's possible , like the one that you have
Much more number of mappers and less number of mappers slots.
On Tue, Apr 16, 2013 at 2:40 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
> Hi Rahul
> If you look at larger cluster and jobs that involve larger input data
> sets. The data would be spread across the whole cluster, and a single node
> might have various blocks of that entire data set. Imagine you have a
> cluster with 100 map slots and your job has 500 map tasks, now in that case
> there should be multiple map tasks in a single task tracker based on slot
> Here if you enable jvm reuse, all tasks related to a job on a single
> TaskTracker would use the same jvm. The benefit here is just the time you
> are saving in spawning and cleaning up jvm for individual tasks.
> On Tue, Apr 16, 2013 at 2:04 PM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>> I have a question related to VM reuse in Hadoop.I now understand the
>> purpose of VM reuse , but I am wondering how is it useful.
>> Example. for VM reuse to be effective or kicked in , we need more than
>> one mapper task to be submitted to a single node (for the same job).Hadoop
>> would consider spawning mappers into nodes which actually contains the data
>> , it might rarely happen that multiple mappers are allocated to a single
>> task tracker. And even if a single task nodes gets to run multiple mappers
>> then it might as well run in parallel in multiple VM rather than
>> sequentially in a single VM.
>> I am sure I am missing some link here , please help me find that.