On Mon, Jan 23, 2012 at 5:11 PM, Jie Li <[EMAIL PROTECTED]> wrote:
> What we are looking for, is more of the difference at the task level.
> Suppose a map task takes 10 minutes in Hadoop, then we have a model to
> analyse what makes up the 10 minutes, e.g. reading from HDFS, invoking the
> map function, writing to the buffer, partitioning, sorting and merging.
> This model can be used to identify the bottleneck of the task execution and
> suggest better configurations.
The task run time hasn't changed from 0.21/0.22. But it has changed if
you compare with 0.20, the new runtime has a lot of performance
improvements and is expected to be better with all the optimizations.
To answer your question, yes your 'model' shouldn't need any changes.
> If we run MR jobs in YARN, can we use the same model to analyse the running
> time of a task? One possible difference I've noticed so far is that the
> shuffling has become a service of the node manager. Any other change
> related to the map phase or reduce phase?
Shuffle used to be part of the TaskTracker, it is now in the NM.
Except that, there isn't much difference that should affect you.