Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> question about scheduler rack awareness

Copy link to this message
Re: question about scheduler rack awareness
Dear Jin,

you wrote:
> my question is : will the map task created on the node which access his
10 blocks most fastest ?

hadoop tries hard to run the map tasks on the node, where the data is
stored. "Hadoop: The Definitive Guide" has some UML Sequence diagrams on
what happens for creation of map jvms. Sorry, I was not able to relocate
them on the web, yet (well, safaribooksonline.com ;-).

Depending on the specific data layout (e.g. record lengths), the map tasks
may need to read other blocks anyway, which may be off-node.

On how many nodes is your 100 blocks file stored? on 10?

If it is on one node, then you're likely running into map slot limits or
container limits.

Best regards,