Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Question about HDFS Architecture


+
Harold Lim 2009-08-20, 22:44
+
Aaron Kimball 2009-08-21, 07:36
+
Harold Lim 2009-08-21, 13:42
+
Aaron Kimball 2009-08-25, 00:55
+
Konstantin Shvachko 2009-08-25, 01:40
+
Harold Lim 2009-08-25, 02:00
Copy link to this message
-
Re: Question about HDFS Architecture
Yes client continues to use the info, until it becomes invalid.
After that it will contact the name-node and update the cache.

--Konstantin

Harold Lim wrote:
> Hi Konstantin,
>
>
> How long does the client keep the info in its cache? Or does it continue to use the info, until it becomes invalid (i.e., contacting a data node but the data node does not have that particular file anymore)?
>
>
>
>
>
> Thanks,
> Harold
>
> --- On Mon, 8/24/09, Konstantin Shvachko <[EMAIL PROTECTED]> wrote:
>
>> From: Konstantin Shvachko <[EMAIL PROTECTED]>
>> Subject: Re: Question about HDFS Architecture
>> To: [EMAIL PROTECTED]
>> Date: Monday, August 24, 2009, 9:40 PM
>> Harold,
>>
>> Both answers by Aaron were incorrect.
>>
>>> Does the client cache this information, or does it
>> always talk to the namenode first?
>>
>> Yes, the client caches replica locations received from the
>> name-node.
>> On open() it receives locations of the first 10 blocks of
>> the file.
>> In most cases these are all file blocks. If not then the
>> client will
>> get another portion of blocks when needed, and will also
>> cache them.
>>
>>> Also, if a file has multiple replicas stored on
>> multiple datanodes on the same "rack", how does the namenode
>> pick which datanode the client has to talk to?
>>
>> The name-node returns block locations ordered by the
>> proximity to the client.
>> The client always contacts data-nodes in this order. It
>> cannot make any decisions
>> about the proximity because it does not possess knowledge
>> about the cluster topology.
>> If all replicas are on the same rack but not local to the
>> client then the ordering
>> returned by the name-node is arbitrary.
>> This may happen mostly if network topology is not
>> configured.
>> Otherwise replicas should be distributed on different
>> racks.
>> 3 replicas should be on at least 2 racks.
>>
>> Thanks
>> --Konstantin
>>
>>
>> Harold Lim wrote:
>>> To read/get a file, I understand that a client first
>> contacts the namenode to determine which datanode has the
>> file/block. Then, it contacts the datanode for the actual
>> file.
>>> Does the client cache this information, or does it
>> always talk to the namenode first?
>>> Also, if a file has multiple replicas stored on
>> multiple datanodes on the same "rack", how does the namenode
>> pick which datanode the client has to talk to? In this case,
>> all datanodes are homogeneous, which makes the
>> "rack-awareness" unimportant to the decision making.
>>> Thanks,
>>> Harold
>>>
>>>
>>>        
>
>
>      
>
+
Todd Lipcon 2009-08-25, 04:43
+
Harold Lim 2009-08-25, 04:57
+
Todd Lipcon 2009-08-25, 05:21
+
Aaron Kimball 2009-08-28, 22:24