|
|
-
Regarding design of HDFS
Sesha Kumar 2011-08-25, 08:04
Hi all, I am trying to get a good understanding of how Hadoop works, for my undergraduate project. I have the following questions/doubts :
1. Why does namenode store the blockmap (block to datanode mapping) in the main memory for all the files, even those that are not used?
2. Why cant namenode move out a part of the blockmap from main memory to a secondary storage device, when free space in main memory becomes scarce ( due to large number of files) ?
3. Why cant the blockmap be constructed when a file is requested (by a client) and then be cached for later accesses?
+
Sesha Kumar 2011-08-25, 08:04
-
Re: Regarding design of HDFS
Jean-Daniel Cryans 2011-08-25, 16:45
In order to have an answer to that sort of question, you first must prove that you did your own homework eg write down what you think the answer is based on your observations and readings, then I'm sure someone will be happy to help you.
J-D
On Thu, Aug 25, 2011 at 1:04 AM, Sesha Kumar <[EMAIL PROTECTED]> wrote: > Hi all, > I am trying to get a good understanding of how Hadoop works, for my > undergraduate project. I have the following questions/doubts : > 1. Why does namenode store the blockmap (block to datanode mapping) in the > main memory for all the files, even those that are not used? > 2. Why cant namenode move out a part of the blockmap from main memory to a > secondary storage device, when free space in main memory becomes scarce ( > due to large number of files) ? > 3. Why cant the blockmap be constructed when a file is requested (by a > client) and then be cached for later accesses?
+
Jean-Daniel Cryans 2011-08-25, 16:45
-
Re: Regarding design of HDFS
Sesha Kumar 2011-09-05, 14:29
On Thu, Aug 25, 2011 at 1:34 PM, Sesha Kumar <[EMAIL PROTECTED]> wrote:
> Hi all, > I am trying to get a good understanding of how Hadoop works, for my > undergraduate project. I have the following questions/doubts : > > 1. Why does namenode store the blockmap (block to datanode mapping) in the > main memory for all the files, even those that are not used? > > 2. Why cant namenode move out a part of the blockmap from main memory to a > secondary storage device, when free space in main memory becomes scarce ( > due to large number of files) ? > > 3. Why cant the blockmap be constructed when a file is requested (by a > client) and then be cached for later accesses? >
Regarding my earlier post as mentioned above. >From what i've read and understood, 1. Namenode stores blockmaps for all the blocks in its main memory. This can be used to keep an up-to-date snapshot of total filesystem. But what i feel is this blockmap is not a constant data and hence storing it in main memory all the time can be avoided in order to save main memory space. On a request for a file from the client the blockmap details can be fetched. As the main memory space is a constraint for adding too many files to filesystem, like in case of small files, this approach can save space. Only the first fetch takes more time and after that we can have streaming data access.
I want to know why this was not considered, or if considered, i want to know why it was not implemented? Am i missing anything obvious. All replies from namenode are for heartbeat signals. Am not sure bout the time trade-off. Will it be much bigger? Is initial time of access as much important as streaming access?
+
Sesha Kumar 2011-09-05, 14:29
-
Re: Regarding design of HDFS
Ted Dunning 2011-09-05, 17:53
The namenode is already a serious bottleneck for meta-data updates. If you allow some of the block map or meta-data to page out to disk, then the bottleneck is going to get much worse.
The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes.
The MapR approach is a useful counter-example here since the architecture was specifically designed so that the only centralized data could be cached indefinitely because coherency can be checked on access. This dramatically increases the distribution of the location information which in turn makes the centralized copy much smaller and more pageable. The virtuous cycle continues by making the distributed resources read/write so that meta-data needn't be centralized.
It is very hard for me to understand how to evolutionarily migrate the current HDFS architecture to something that admits paging of data to disk. The problem is that there are logical circularities with the current approach that force either the current design or a major rebuild from the ground up.
On Mon, Sep 5, 2011 at 9:29 AM, Sesha Kumar <[EMAIL PROTECTED]> wrote:
> 1. Namenode stores blockmaps for all the blocks in its main memory. This > can be used to keep an up-to-date snapshot of total filesystem. But what i > feel is this blockmap is not a constant data and hence storing it in main > memory all the time can be avoided in order to save main memory space. On a > request for a file from the client the blockmap details can be fetched.
+
Ted Dunning 2011-09-05, 17:53
-
Re: Regarding design of HDFS
Dhruba Borthakur 2011-09-06, 04:52
My answers inline.
1. Why does namenode store the blockmap (block to datanode mapping) in the main memory for all the files, even those that are not used?
The block to datanode mapping is needed for two reasons: when a client wants to read a file, the namenode has to tell the client the locations of the blocks that make up the file. Also, when a datanode dies, the namenode has to quickly find the blocks that resided on that datanode so that it can re-replicate those blocks.
2. Why cant namenode move out a part of the blockmap from main memory to a secondary storage device, when free space in main memory becomes scarce ( due to large number of files) ? 3. Why cant the blockmap be constructed when a file is requested (by a client) and then be cached for later accesses?
Both of the above can be done if needed. But when there is a better way to scale, why do this? Please look at my comments below. > The only way to avoid this is to make the data much more cacheable and to > have a viable cache coherency strategy. Cache coherency at the meta-data > level is difficult. Cache coherency at the block level is also difficult > (but not as difficult) because many blocks get moved for balance purposes. > > I would argue that a federation-model is much more scalable, elegant and easier to maintain. It takes a very well-oiled building block like the NameNode and allows you to use multitudes of them in a single This is already part of HDFS trunk code base.
thanks dhruba
+
Dhruba Borthakur 2011-09-06, 04:52
-
Re: Regarding design of HDFS
Sesha Kumar 2011-09-07, 14:35
thanks a lot
+
Sesha Kumar 2011-09-07, 14:35
-
RE: Regarding design of HDFS
kang hua 2011-09-13, 03:38
Hi Master: can you explain more detail --- "The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes" why "Cache coherency at the meta-data level is difficult" ? why "Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes" thanks a lotttttttttttttttttttttttttttttt! kanghua
Date: Mon, 5 Sep 2011 21:52:53 -0700 Subject: Re: Regarding design of HDFS From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]
My answers inline. 1. Why does namenode store the blockmap (block to datanode mapping) in the main memory for all the files, even those that are not used?
The block to datanode mapping is needed for two reasons: when a client wants to read a file, the namenode has to tell the client the locations of the blocks that make up the file. Also, when a datanode dies, the namenode has to quickly find the blocks that resided on that datanode so that it can re-replicate those blocks.
2. Why cant namenode move out a part of the blockmap from main memory to a secondary storage device, when free space in main memory becomes scarce ( due to large number of files) ?3. Why cant the blockmap be constructed when a file is requested (by a client) and then be cached for later accesses?
Both of the above can be done if needed. But when there is a better way to scale, why do this? Please look at my comments below. The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes. I would argue that a federation-model is much more scalable, elegant and easier to maintain. It takes a very well-oiled building block like the NameNode and allows you to use multitudes of them in a single This is already part of HDFS trunk code base.
thanksdhruba
+
kang hua 2011-09-13, 03:38
-
Re: Regarding design of HDFS
Ted Dunning 2011-09-13, 11:20
2011/9/13 kang hua <[EMAIL PROTECTED]>
> Hi Master: > can you explain more detail --- "The only way to avoid this is to > make the data much more cacheable and to have a viable cache coherency > strategy. Cache coherency at the meta-data level is difficult. Cache > coherency at the block level is also difficult (but not as difficult) > because many blocks get moved for balance purposes" > why "Cache coherency at the meta-data level is difficult" ? >
I said this because meta-data is updated often. Caching in the presence of high updates requires some sort of coherency model. For meta-data, it is difficult to detect stale information on use and use of stale information can be disastrous. Thus, caching is difficult. > why "Cache coherency at the block level is also difficult (but not as > difficult) because many blocks get moved for balance purposes" >
The basic problem here is update rate. Late detection of stale information is much easier however since you can just note that the block isn't where you thought it was and update your cache. There are still problems and the fact that race conditions are still being found in the HDFS lease management code is an indicator that this isn't a completely trivial problem.
+
Ted Dunning 2011-09-13, 11:20
-
Re: Regarding design of HDFS
Kanghua151 2011-09-15, 08:56
i get it 。3x 发自我的 iPhone
在 2011-9-13,19:20,Ted Dunning <[EMAIL PROTECTED]> 写� 溃� > > > 2011/9/13 kang hua <[EMAIL PROTECTED]> > Hi Master: > can you explain more detail --- "The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes" > why "Cache coherency at the meta-data level is difficult" ? > > I said this because meta-data is updated often. Caching in the presence of high updates requires some sort of coherency model. For meta-data, it is difficult to detect stale information on use and use of stale information can be disastrous. Thus, caching is difficult. > > why "Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes" > > The basic problem here is update rate. Late detection of stale information is much easier however since you can just note that the block isn't where you thought it was and update your cache. There are still problems and the fact that race conditions are still being found in the HDFS lease management code is an indicator that this isn't a completely trivial problem. >
+
Kanghua151 2011-09-15, 08:56
-
Re: Regarding design of HDFS
Aaron Eng 2011-09-13, 04:16
> >> The only way to avoid this is to make the data much more cacheable and to >> have a viable cache coherency strategy. Cache coherency at the meta-data >> level is difficult. Cache coherency at the block level is also difficult >> (but not as difficult) because many blocks get moved for balance purposes. >> >> > I would argue that a federation-model is much more scalable, elegant and > easier to maintain. It takes a very well-oiled building block like the > NameNode and allows you to use multitudes of them in a single This is > already part of HDFS trunk code base. > > I think Sesha's questions are about the memory footprint of the namenode and why it has to operate the way it does, why isn't it possible for the existing namenode capabilities to be implemented in a design where less memory is used? I think the motivation for that question is based on the assessment that the resource utilization profile of the NameNode is inefficient, that it occupies large amounts of memory but in many use cases, much of that memory is accessed infrequently. I do not think his question was about how to further scale out an inefficient system. I would argue that federation is a way of taking an inefficient system and scaling it out such that the system is larger but with the same proportion of inefficiency. I don't think it addresses the problem Sesha is asking about. Sesha,
The short answer, as you've probably gathered, is that it is difficult to adapt the existing HDFS code base to support the type of model you are thinking about.
+
Aaron Eng 2011-09-13, 04:16
|
|