Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> question about scheduler rack awareness


Copy link to this message
-
Re: question about scheduler rack awareness
Dear Jin,

you wrote:
> my question is : will the map task created on the node which access his
10 blocks most fastest ?

hadoop tries hard to run the map tasks on the node, where the data is
stored. "Hadoop: The Definitive Guide" has some UML Sequence diagrams on
what happens for creation of map jvms. Sorry, I was not able to relocate
them on the web, yet (well, safaribooksonline.com ;-).

Depending on the specific data layout (e.g. record lengths), the map tasks
may need to read other blocks anyway, which may be off-node.

On how many nodes is your 100 blocks file stored? on 10?

If it is on one node, then you're likely running into map slot limits or
container limits.

Best regards,

Jens