-Re: question about scheduler rack awareness
Jens Scheidtmann 2013-03-19, 08:07
> my question is : will the map task created on the node which access his
10 blocks most fastest ?
hadoop tries hard to run the map tasks on the node, where the data is
stored. "Hadoop: The Definitive Guide" has some UML Sequence diagrams on
what happens for creation of map jvms. Sorry, I was not able to relocate
them on the web, yet (well, safaribooksonline.com ;-).
Depending on the specific data layout (e.g. record lengths), the map tasks
may need to read other blocks anyway, which may be off-node.
On how many nodes is your 100 blocks file stored? on 10?
If it is on one node, then you're likely running into map slot limits or