Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Question about HDFS Architecture


+
Harold Lim 2009-08-20, 22:44
+
Aaron Kimball 2009-08-21, 07:36
+
Harold Lim 2009-08-21, 13:42
+
Aaron Kimball 2009-08-25, 00:55
Copy link to this message
-
Re: Question about HDFS Architecture
Konstantin Shvachko 2009-08-25, 01:40
Harold,

Both answers by Aaron were incorrect.

 > Does the client cache this information, or does it always talk to the namenode first?

Yes, the client caches replica locations received from the name-node.
On open() it receives locations of the first 10 blocks of the file.
In most cases these are all file blocks. If not then the client will
get another portion of blocks when needed, and will also cache them.

 > Also, if a file has multiple replicas stored on multiple datanodes on the same "rack", how does the namenode pick which datanode the client has to talk to?

The name-node returns block locations ordered by the proximity to the client.
The client always contacts data-nodes in this order. It cannot make any decisions
about the proximity because it does not possess knowledge about the cluster topology.
If all replicas are on the same rack but not local to the client then the ordering
returned by the name-node is arbitrary.
This may happen mostly if network topology is not configured.
Otherwise replicas should be distributed on different racks.
3 replicas should be on at least 2 racks.

Thanks
--Konstantin
Harold Lim wrote:
> To read/get a file, I understand that a client first contacts the namenode to determine which datanode has the file/block. Then, it contacts the datanode for the actual file.
>
> Does the client cache this information, or does it always talk to the namenode first?
>
> Also, if a file has multiple replicas stored on multiple datanodes on the same "rack", how does the namenode pick which datanode the client has to talk to? In this case, all datanodes are homogeneous, which makes the "rack-awareness" unimportant to the decision making.
>
> Thanks,
> Harold
>
>
>      
>
+
Harold Lim 2009-08-25, 02:00
+
Konstantin Shvachko 2009-08-25, 02:07
+
Todd Lipcon 2009-08-25, 04:43
+
Harold Lim 2009-08-25, 04:57
+
Todd Lipcon 2009-08-25, 05:21
+
Aaron Kimball 2009-08-28, 22:24