-Re: why not hadoop backup name node data to local disk daily or hourly?
周梦想 2012-12-24, 10:40
Thanks to Harsh and Mohammad. Because of data crash, I got ill,so reply
2012/12/20 Harsh J <[EMAIL PROTECTED]>
> On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
> > Some reasons lead to my name node data error, but the error data also
> > overwrite the second name node data, also the NFS backup. I want to
> > the name node data a day ago or even a week ago,but I can't.
> The SecondaryNameNode does this, and that is also why it is
> recommended to run. In HA HDFS, the StandbyNameNode does the same
> action of checkpoints as SecondaryNameNode, to achieve the same
> periodic goal.
Actually the problem was beggining at SecondNameNode. We changed all IPs of
the Hadoop System. It runs ok for about 2 hours. Then my monitor script
sent me an email that SNN exited. And it couldn't be started again, every
time it report a NULL Exception. So we try to stop all hadoop system and
start again. But unfortunately, this time even NN could start and reported
the same error.
After that we tried several ways, but it never work, including import
checkpoint from SNN. we found that every copy of NameNode is error.Then we
removed the edits.new and reset edits file, the NN started ok, While HBase
began complain that could not find blocks, even the .META. table has error.
hbck reports many blocks error.
We wanted to change the IPs to old ones, but the problems still remain.
We even can't roll back to the old NN data before changed IPs.
> This form of corruption at the SNN too should *never* occur normally,
> and your SNN last-checkpoint-time should be actively monitored to not
> grow too old (a sign of issues). Your version of Hadoop probably is
> still affected by https://issues.apache.org/jira/browse/HDFS-3652 and
> you should update to avoid loss due to it?
> Also, if you ever suspect a local copy of NN to be bad, save its
> namespace (hadoop dfsadmin -saveNamespace, requires NN be put in
> safemode first) before you bring it down - this saves a copy from the
> memory onto the disk.
> > I have to back
> > up name node data manually or write a bash script to backup it? why
> > does not give a configure to backup name node data to local disk daily
> > hourly with different time stamp name?
> If the NN's disk itself is corrupt, backing it up would be no good
> either, so this solution vs. SNN still doesn't solve anything of your
> original issue.
NN and SNN design just avoid that one machine corrupt，but it can't
rollback for a period.
if some reason import errors to NN, and spread to SNN, we have some quick
ways to recover?
> > The same question is to HBase's .META. and -ROOT- table. I think it's
> > history storage is more important 100 times than the log history.
> The HBase .META. and -ROOT- are already on HDFS, so are pretty
> reliable (with HBase's WAL and 3x replication of blocks).
Just because of Hadoop NN's problem, HBase can't find its tables and data.
> > I think it could be implemented in Second Name Node/Check Points Node or
> > Back Node. Now I do this just using bash script.
> I don't think using a bash script to backup the metadata is a better
> solution than relying on the SecondaryNameNode. Two reasons: It does
> the same form of a copy-backup (no validation like SNN does), and it
> does not checkpoint (i.e. merge the edits into the fsimage).
I'm using SNN too, but I'm fear of NameNode and SNN data corrupt.
> Harsh J