Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Question about HDFS Architecture


Copy link to this message
-
Re: Question about HDFS Architecture
Hi Todd,

Yes. My question is about multiple re-opens. For example, I have an application that reads/fetches a file depending on what a user chooses. So, in this case, there is no location caching?

Thanks,
Harold
--- On Tue, 8/25/09, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> From: Todd Lipcon <[EMAIL PROTECTED]>
> Subject: Re: Question about HDFS Architecture
> To: [EMAIL PROTECTED]
> Date: Tuesday, August 25, 2009, 12:43 AM
> On Mon, Aug 24, 2009 at 6:40 PM, Konstantin
> Shvachko <[EMAIL PROTECTED]>
> wrote:
>
>
> Harold,
>
>
>
> Both answers by Aaron were incorrect.
>
>
>
> > Does the client cache this information, or does it
> always talk to the namenode first?
>
>
>
> Yes, the client caches replica locations received from the
> name-node.
>
> On open() it receives locations of the first 10 blocks of
> the file.
>
> In most cases these are all file blocks. If not then the
> client will
>
> get another portion of blocks when needed, and will also
> cache them.
> This is only within a single DFSInputStream. The
> block location cache does not persist across re-opens of the
> same file. As I read the original question, it was about
> longer-term caching, not just keeping state during a single
> DFSInputStream.
>
>
> -Todd
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB