Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Question about HDFS Architecture


Copy link to this message
-
Re: Question about HDFS Architecture
Harold Lim 2009-08-25, 02:00
Hi Konstantin,
How long does the client keep the info in its cache? Or does it continue to use the info, until it becomes invalid (i.e., contacting a data node but the data node does not have that particular file anymore)?

Thanks,
Harold

--- On Mon, 8/24/09, Konstantin Shvachko <[EMAIL PROTECTED]> wrote:

> From: Konstantin Shvachko <[EMAIL PROTECTED]>
> Subject: Re: Question about HDFS Architecture
> To: [EMAIL PROTECTED]
> Date: Monday, August 24, 2009, 9:40 PM
> Harold,
>
> Both answers by Aaron were incorrect.
>
> > Does the client cache this information, or does it
> always talk to the namenode first?
>
> Yes, the client caches replica locations received from the
> name-node.
> On open() it receives locations of the first 10 blocks of
> the file.
> In most cases these are all file blocks. If not then the
> client will
> get another portion of blocks when needed, and will also
> cache them.
>
> > Also, if a file has multiple replicas stored on
> multiple datanodes on the same "rack", how does the namenode
> pick which datanode the client has to talk to?
>
> The name-node returns block locations ordered by the
> proximity to the client.
> The client always contacts data-nodes in this order. It
> cannot make any decisions
> about the proximity because it does not possess knowledge
> about the cluster topology.
> If all replicas are on the same rack but not local to the
> client then the ordering
> returned by the name-node is arbitrary.
> This may happen mostly if network topology is not
> configured.
> Otherwise replicas should be distributed on different
> racks.
> 3 replicas should be on at least 2 racks.
>
> Thanks
> --Konstantin
>
>
> Harold Lim wrote:
> > To read/get a file, I understand that a client first
> contacts the namenode to determine which datanode has the
> file/block. Then, it contacts the datanode for the actual
> file.
> >
> > Does the client cache this information, or does it
> always talk to the namenode first?
> > Also, if a file has multiple replicas stored on
> multiple datanodes on the same "rack", how does the namenode
> pick which datanode the client has to talk to? In this case,
> all datanodes are homogeneous, which makes the
> "rack-awareness" unimportant to the decision making.
> >
> > Thanks,
> > Harold
> >
> >
> >       
>