Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Question about HDFS Architecture

Copy link to this message
Question about HDFS Architecture
To read/get a file, I understand that a client first contacts the namenode to determine which datanode has the file/block. Then, it contacts the datanode for the actual file.

Does the client cache this information, or does it always talk to the namenode first?

Also, if a file has multiple replicas stored on multiple datanodes on the same "rack", how does the namenode pick which datanode the client has to talk to? In this case, all datanodes are homogeneous, which makes the "rack-awareness" unimportant to the decision making.