|
|
-
problem of lost name-node
Mirko Kämpf 2011-09-27, 15:36
Hi, during the Cloudera Developer Training at Berlin I came up with an idea, regarding a lost name-node. As in this case all data blocks are lost. The solution could be, to have a table which relates filenames and block_ids on that node, which can be scaned after a name-node is lost. Or on every block could be a kind of a backlink to the filename and the total nr of blocks and/or a total hashsum attached. This would it make easy to recover with minimal overhead.
Now I would like to ask the developer community, if there is any good reason not to do this? Before I start to figure out where to start an implementation of such a feature.
Thanks, Mirko
-
problem of lost name-node
Mirko Kämpf 2011-09-28, 07:06
Hello, during the Cloudera Developer Training at Berlin I came up with an idea, regarding a lost name-node.
As in this case all data blocks are lost, the solution could be, to have a table which relates *filenames *and *block_ids* on that data node a block is stored. This table can be scaned after a name-node is lost. Or, even on every block could be a kind of a backlink to the filename and the total nr of blocks and/or a total hashsum attached. This would it make easy to recover a broken HDFS with minimal overhead.
Now I would like to ask the developer community, if there is any good reason not to do this, before I start to figure out, where to start an implementation of such a feature.
So the name node would not be any longe such a high risk, I think.
Thanks a lot, Mirko
-
Re: problem of lost name-node
Ravi Prakash 2011-09-28, 13:27
Hi Mirko,
Its seems like a great idea to me!! The architects and senior developers might have some more insight on this though.
I think part of the reason why the community might be lazy about implementing this is because the Namenode being a single point of failure is usually regarded as FUD. There are simple tricks (like writing the fsimage and editslog to NFS) which can guard against some failure scenarios, and I think most users of hadoop are satisfied with that.
I wouldn't be too surprised if there is already a JIRA for this. But if you could come up with a patch, I'm hopeful the community would be interested in it.
Cheers Ravi
2011/9/27 Mirko Kämpf <[EMAIL PROTECTED]>
> Hi, > during the Cloudera Developer Training at Berlin I came up with an idea, > regarding a lost name-node. > As in this case all data blocks are lost. The solution could be, to have a > table which relates filenames and block_ids on that node, which can be > scaned > after a name-node is lost. Or on every block could be a kind of a backlink > to the filename and the total nr of blocks and/or a total hashsum attached. > This would it make easy to recover with minimal overhead. > > Now I would like to ask the developer community, if there is any good > reason > not to do this? > Before I start to figure out where to start an implementation of such a > feature. > > Thanks, > Mirko >
-
Re: problem of lost name-node
Robert Evans 2011-09-28, 13:55
There is also some work underway to add in HA and failover to the namenode. You might get more success if you send your note to hdfs-dev instead of common-dev. One other thing that can sometimes get a discussion going is to just file a JIRA for it. People interested in it are likely to start watching it, and you can often have a good conversation there about it.
--Bobby Evans
On 9/28/11 8:27 AM, "Ravi Prakash" <[EMAIL PROTECTED]> wrote:
Hi Mirko,
Its seems like a great idea to me!! The architects and senior developers might have some more insight on this though.
I think part of the reason why the community might be lazy about implementing this is because the Namenode being a single point of failure is usually regarded as FUD. There are simple tricks (like writing the fsimage and editslog to NFS) which can guard against some failure scenarios, and I think most users of hadoop are satisfied with that.
I wouldn't be too surprised if there is already a JIRA for this. But if you could come up with a patch, I'm hopeful the community would be interested in it.
Cheers Ravi
2011/9/27 Mirko Kämpf <[EMAIL PROTECTED]>
> Hi, > during the Cloudera Developer Training at Berlin I came up with an idea, > regarding a lost name-node. > As in this case all data blocks are lost. The solution could be, to have a > table which relates filenames and block_ids on that node, which can be > scaned > after a name-node is lost. Or on every block could be a kind of a backlink > to the filename and the total nr of blocks and/or a total hashsum attached. > This would it make easy to recover with minimal overhead. > > Now I would like to ask the developer community, if there is any good > reason > not to do this? > Before I start to figure out where to start an implementation of such a > feature. > > Thanks, > Mirko >
-
Re: problem of lost name-node
Steve Loughran 2011-09-28, 14:06
One of the issues here is keeping that list up to date. You don't want filename operations on the NN to push out changes to datanodes (which may not be there, after all), and you don't necessarily want every block creation operation on a DN to force an update on what effectively becomes a mini-db of (filename, block) mappings. Yes, it could just be a text file, but you still need to push out atomic updates which don't lose the previous version on a power failure. That update would have to be thread safe, you would have to decide whether to make it save-immediately vs lazy-write.
In the situation in which your NN loses the entire image -and all its backups- you are going to lose the directory tree. All the per-DN metadata would do is leave you with some useful filenames (2011_09_22_EMEA_paying_customers.csv.lzo) and lots that aren't (mapout0043.something). Someone is still going to have to try and recreate what appears to be a functional directory tree from it. Then once you add layers on top like HBase, life is even more complicated as the filenames will stop bearing any relationship to the content.
I'd go for a process that makes checkpointing NN state more reliable. That could include making it easier for the secondary namenode to push out updates to worker nodes in the system that can store timestamped/version stamped copies of the state; it could be improving recovery of state, and it could be better code to make sure that the secondary Namenode is actually working. Because you will need a secondary namenode on any cluster of moderate size, and you will need to make sure it is working -and test it-
On 28/09/11 14:27, Ravi Prakash wrote: > Hi Mirko, > > Its seems like a great idea to me!! The architects and senior developers > might have some more insight on this though. > > I think part of the reason why the community might be lazy about > implementing this is because the Namenode being a single point of failure is > usually regarded as FUD. There are simple tricks (like writing the fsimage > and editslog to NFS) which can guard against some failure scenarios, and I > think most users of hadoop are satisfied with that. > > I wouldn't be too surprised if there is already a JIRA for this. But if you > could come up with a patch, I'm hopeful the community would be interested in > it. > > Cheers > Ravi > > 2011/9/27 Mirko K�mpf<[EMAIL PROTECTED]> > >> Hi, >> during the Cloudera Developer Training at Berlin I came up with an idea, >> regarding a lost name-node. >> As in this case all data blocks are lost. The solution could be, to have a >> table which relates filenames and block_ids on that node, which can be >> scaned >> after a name-node is lost. Or on every block could be a kind of a backlink >> to the filename and the total nr of blocks and/or a total hashsum attached. >> This would it make easy to recover with minimal overhead. >> >> Now I would like to ask the developer community, if there is any good >> reason >> not to do this? >> Before I start to figure out where to start an implementation of such a >> feature. >> >> Thanks, >> Mirko >> >
-
Re: problem of lost name-node
Ravi Prakash 2011-09-29, 14:13
Hi,
@Mirko: Please file a JIRA. This seems an appropriate time.
@Steve: If we store the absolute filenames (i.e. the whole path), would we still have the problem you outlined in the 2nd para? I do agree updating would have to be pushed out and that might be cumbersome, but hey, we are processing heartbeats from the datanodes every 3 seconds anyway. Maybe we can piggyback those updates? I'm sure there are better solutions as well and I don't think these problems are show-stoppers. If this solutions helps to decrease the FUD, then I think it might be worth it (apart from its merit)
Just my $.02 Ravi On Wed, Sep 28, 2011 at 9:06 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> > One of the issues here is keeping that list up to date. You don't want > filename operations on the NN to push out changes to datanodes (which may > not be there, after all), and you don't necessarily want every block > creation operation on a DN to force an update on what effectively becomes a > mini-db of (filename, block) mappings. Yes, it could just be a text file, > but you still need to push out atomic updates which don't lose the previous > version on a power failure. That update would have to be thread safe, you > would have to decide whether to make it save-immediately vs lazy-write. > > In the situation in which your NN loses the entire image -and all its > backups- you are going to lose the directory tree. All the per-DN metadata > would do is leave you with some useful filenames (2011_09_22_EMEA_paying_* > *customers.csv.lzo) and lots that aren't (mapout0043.something). Someone > is still going to have to try and recreate what appears to be a functional > directory tree from it. Then once you add layers on top like HBase, life is > even more complicated as the filenames will stop bearing any relationship to > the content. > > I'd go for a process that makes checkpointing NN state more reliable. That > could include making it easier for the secondary namenode to push out > updates to worker nodes in the system that can store timestamped/version > stamped copies of the state; it could be improving recovery of state, and it > could be better code to make sure that the secondary Namenode is actually > working. Because you will need a secondary namenode on any cluster of > moderate size, and you will need to make sure it is working -and test it- > > > On 28/09/11 14:27, Ravi Prakash wrote: > >> Hi Mirko, >> >> Its seems like a great idea to me!! The architects and senior developers >> might have some more insight on this though. >> >> I think part of the reason why the community might be lazy about >> implementing this is because the Namenode being a single point of failure >> is >> usually regarded as FUD. There are simple tricks (like writing the fsimage >> and editslog to NFS) which can guard against some failure scenarios, and I >> think most users of hadoop are satisfied with that. >> >> I wouldn't be too surprised if there is already a JIRA for this. But if >> you >> could come up with a patch, I'm hopeful the community would be interested >> in >> it. >> >> Cheers >> Ravi >> >> 2011/9/27 Mirko Kämpf<mirko.kaempf@googlemail.**com<[EMAIL PROTECTED]> >> > >> >> Hi, >>> during the Cloudera Developer Training at Berlin I came up with an idea, >>> regarding a lost name-node. >>> As in this case all data blocks are lost. The solution could be, to have >>> a >>> table which relates filenames and block_ids on that node, which can be >>> scaned >>> after a name-node is lost. Or on every block could be a kind of a >>> backlink >>> to the filename and the total nr of blocks and/or a total hashsum >>> attached. >>> This would it make easy to recover with minimal overhead. >>> >>> Now I would like to ask the developer community, if there is any good >>> reason >>> not to do this? >>> Before I start to figure out where to start an implementation of such a >>> feature. >>> >>> Thanks, >>> Mirko >>> >>> >> >
|
|