If we have a 640 MB data file and have 3 Data Nodes in a cluster.
The file can be split into 10 Blocks and starts the Mappers M1, M2, M3 first.
As each one completes the task M4 and so on will be run.
It appears like it is not necessary to run all the 10 Map tasks in parallel at once.
Just wondering if this is right assumption.
What if we have 10 TB of data file with 3 Data Nodes, how to find the number of mappers that will be created.