|
Bjoern Schiessle
2010-12-22, 15:03
daniel sikar
2010-12-22, 18:01
li ping
2010-12-23, 01:30
Bjoern Schiessle
2010-12-23, 10:50
li ping
2010-12-23, 12:45
rahul patodi
2010-12-23, 15:29
Aaron T. Myers
2010-12-23, 17:15
Todd Lipcon
2010-12-23, 20:02
Jakob Homan
2010-12-23, 20:47
Todd Lipcon
2010-12-23, 21:08
Bjoern Schiessle
2010-12-23, 22:05
Bjoern Schiessle
2010-12-23, 22:06
Ryan Rawson
2010-12-23, 22:46
|
-
namenode doesn't start after rebootBjoern Schiessle 2010-12-22, 15:03
Hi,
After a Kernel update and a reboot the namenode doesn't start. I run the Cloudera cdh3 Hadoop distribution. I have already searched for a solution. It looks like I'm not the only one with such a problem. Sadly I could only find descriptions of similar problems, but no solutions... This is the error message from the namenode log file: 2010-12-22 16:13:04,830 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = pcube/129.69.216.24 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2+737 STARTUP_MSG: build = -r 98c55c28258aa6f42250569bd7fa431ac657bdbd; compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010 ************************************************************/ 2010-12-22 16:13:05,001 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-12-22 16:13:05,007 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2010-12-22 16:13:05,040 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2010-12-22 16:13:05,335 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-12-22 16:13:05,336 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-12-22 16:13:05,361 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 72 2010-12-22 16:13:05,374 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 3 2010-12-22 16:13:05,375 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8822 loaded in 0 seconds. 2010-12-22 16:13:05,377 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157) 2010-12-22 16:13:05,377 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at pcube/129.69.216.24 ************************************************************/ Any idea what could be wrong and how I can get my namenode up running again? Thanks a lot! Björn
-
Re: namenode doesn't start after rebootdaniel sikar 2010-12-22, 18:01
I can't help but with hindsight - it's advisable to snapshot your
namenodes as HDFS dies with them. On 22 December 2010 15:03, Bjoern Schiessle <[EMAIL PROTECTED]> wrote: > Hi, > > After a Kernel update and a reboot the namenode doesn't start. I run the > Cloudera cdh3 Hadoop distribution. I have already searched for a solution. > It looks like I'm not the only one with such a problem. Sadly I could only > find descriptions of similar problems, but no solutions... > > This is the error message from the namenode log file: > > > 2010-12-22 16:13:04,830 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting NameNode > STARTUP_MSG: host = pcube/129.69.216.24 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.2+737 > STARTUP_MSG: build = -r 98c55c28258aa6f42250569bd7fa431ac657bdbd; compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010 > ************************************************************/ > 2010-12-22 16:13:05,001 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null > 2010-12-22 16:13:05,007 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext > 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs > 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup > 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false > 2010-12-22 16:13:05,040 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) > 2010-12-22 16:13:05,335 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext > 2010-12-22 16:13:05,336 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean > 2010-12-22 16:13:05,361 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 72 > 2010-12-22 16:13:05,374 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 3 > 2010-12-22 16:13:05,375 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8822 loaded in 0 seconds. > 2010-12-22 16:13:05,377 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206) > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845) > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317) > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214) > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394) > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)
-
Re: namenode doesn't start after rebootli ping 2010-12-23, 01:30
It seems the exception occurs during NameNode loads the editlog.
make sure the editlog file exists. or you can debug the application to see what's wrong. On Thu, Dec 23, 2010 at 2:01 AM, daniel sikar <[EMAIL PROTECTED]> wrote: > I can't help but with hindsight - it's advisable to snapshot your > namenodes as HDFS dies with them. > > On 22 December 2010 15:03, Bjoern Schiessle <[EMAIL PROTECTED]> wrote: > > Hi, > > > > After a Kernel update and a reboot the namenode doesn't start. I run the > > Cloudera cdh3 Hadoop distribution. I have already searched for a > solution. > > It looks like I'm not the only one with such a problem. Sadly I could > only > > find descriptions of similar problems, but no solutions... > > > > This is the error message from the namenode log file: > > > > > > 2010-12-22 16:13:04,830 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: > > /************************************************************ > > STARTUP_MSG: Starting NameNode > > STARTUP_MSG: host = pcube/129.69.216.24 > > STARTUP_MSG: args = [] > > STARTUP_MSG: version = 0.20.2+737 > > STARTUP_MSG: build = -r 98c55c28258aa6f42250569bd7fa431ac657bdbd; > compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010 > > ************************************************************/ > > 2010-12-22 16:13:05,001 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=NameNode, sessionId=null > > 2010-12-22 16:13:05,007 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing > NameNodeMeterics using context > object:org.apache.hadoop.metrics.spi.NullContext > > 2010-12-22 16:13:05,036 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs > > 2010-12-22 16:13:05,036 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup > > 2010-12-22 16:13:05,036 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isPermissionEnabled=false > > 2010-12-22 16:13:05,040 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), > accessTokenLifetime=0 min(s) > > 2010-12-22 16:13:05,335 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: > Initializing FSNamesystemMetrics using context > object:org.apache.hadoop.metrics.spi.NullContext > > 2010-12-22 16:13:05,336 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered > FSNamesystemStatusMBean > > 2010-12-22 16:13:05,361 INFO > org.apache.hadoop.hdfs.server.common.Storage: Number of files = 72 > > 2010-12-22 16:13:05,374 INFO > org.apache.hadoop.hdfs.server.common.Storage: Number of files under > construction = 3 > > 2010-12-22 16:13:05,375 INFO > org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8822 loaded > in 0 seconds. > > 2010-12-22 16:13:05,377 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > java.lang.NullPointerException > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206) > > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637) > > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034) > > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845) > > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)
-
Re: namenode doesn't start after rebootBjoern Schiessle 2010-12-23, 10:50
Hi,
On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote: > It seems the exception occurs during NameNode loads the editlog. > make sure the editlog file exists. or you can debug the application to > see what's wrong. last night I tried to fix the problem and did a big mistake. Instead of copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and edits.new to a backup I moved them and later delete the only version hence I thought I have a copy. The good thing: The namenode starts again. The bad thing: My file system is now in an inconsistent state. Probably the only solution is to reformat the hdfs and start from scratch. Thankfully there wasn't that much data stored at the hdfs until now but I definitely have to make sure that this doesn't happens again: 1. I have set up a second dfs.name.dir which is stored at another computer (mounted by sshfs) 2. I will install a backup script similar to: http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script Do you think this should be enough to overcome such situations in the future? Any additional ideas how to make it more safe? I'm still a little bit afraid if I think about the next time I will have to reboot the server. Shouldn't a reboot safely stop and restart all Hadoop services? Is there any thing I can do to make sure that the next reboot will not cause the same problems? Thanks a lot! Björn
-
Re: namenode doesn't start after rebootli ping 2010-12-23, 12:45
As far as I know, setup a backup namenode dir is enough.
I haven't use the hadoop in a production environment. So, I can't tell you what would be right way to reboot the server. On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <[EMAIL PROTECTED]>wrote: > Hi, > > On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote: > > It seems the exception occurs during NameNode loads the editlog. > > make sure the editlog file exists. or you can debug the application to > > see what's wrong. > > last night I tried to fix the problem and did a big mistake. Instead of > copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and > edits.new to a backup I moved them and later delete the only version > hence I thought I have a copy. > > The good thing: The namenode starts again. > The bad thing: My file system is now in an inconsistent state. > > Probably the only solution is to reformat the hdfs and start from > scratch. Thankfully there wasn't that much data stored at the hdfs until > now but I definitely have to make sure that this doesn't happens again: > > 1. I have set up a second dfs.name.dir which is stored at another > computer (mounted by sshfs) > 2. I will install a backup script similar to: > http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script > > Do you think this should be enough to overcome such situations in the > future? Any additional ideas how to make it more safe? > > I'm still a little bit afraid if I think about the next time I will have > to reboot the server. Shouldn't a reboot safely stop and restart all > Hadoop services? Is there any thing I can do to make sure that the next > reboot will not cause the same problems? > > Thanks a lot! > Björn > > > -- -----李平
-
Re: namenode doesn't start after rebootrahul patodi 2010-12-23, 15:29
Hi,
If you want to reboot the server: 1. stop mapred 2. stop dfs the reboot when you again want to restart hadoop firstly start dfs then mepred. -- *Regards*, Rahul Patodi Software Engineer, Impetus Infotech (India) Pvt Ltd, www.impetus.com Mob:09907074413 On Thu, Dec 23, 2010 at 6:15 PM, li ping <[EMAIL PROTECTED]> wrote: > As far as I know, setup a backup namenode dir is enough. > > I haven't use the hadoop in a production environment. So, I can't tell you > what would be right way to reboot the server. > > On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote: > > > It seems the exception occurs during NameNode loads the editlog. > > > make sure the editlog file exists. or you can debug the application to > > > see what's wrong. > > > > last night I tried to fix the problem and did a big mistake. Instead of > > copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and > > edits.new to a backup I moved them and later delete the only version > > hence I thought I have a copy. > > > > The good thing: The namenode starts again. > > The bad thing: My file system is now in an inconsistent state. > > > > Probably the only solution is to reformat the hdfs and start from > > scratch. Thankfully there wasn't that much data stored at the hdfs until > > now but I definitely have to make sure that this doesn't happens again: > > > > 1. I have set up a second dfs.name.dir which is stored at another > > computer (mounted by sshfs) > > 2. I will install a backup script similar to: > > http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script > > > > Do you think this should be enough to overcome such situations in the > > future? Any additional ideas how to make it more safe? > > > > I'm still a little bit afraid if I think about the next time I will have > > to reboot the server. Shouldn't a reboot safely stop and restart all > > Hadoop services? Is there any thing I can do to make sure that the next > > reboot will not cause the same problems? > > > > Thanks a lot! > > Björn > > > > > > > > > -- > -----李平 >
-
Re: namenode doesn't start after rebootAaron T. Myers 2010-12-23, 17:15
All this aside, you really shouldn't have to "safely" stop all the Hadoop
services when you reboot any of your servers. Hadoop should be able to survive a crash of any of the daemons. Any circumstance in which Hadoop currently corrupts the edits log or fsimage is a serious bug, and should be reported via JIRA. -- Aaron T. Myers Software Engineer, Cloudera On Thu, Dec 23, 2010 at 7:29 AM, rahul patodi <[EMAIL PROTECTED]> wrote: > Hi, > If you want to reboot the server: > 1. stop mapred > 2. stop dfs > the reboot > when you again want to restart hadoop > firstly start dfs then mepred. > > -- > *Regards*, > Rahul Patodi > Software Engineer, > Impetus Infotech (India) Pvt Ltd, > www.impetus.com > Mob:09907074413 > > > On Thu, Dec 23, 2010 at 6:15 PM, li ping <[EMAIL PROTECTED]> wrote: > > > As far as I know, setup a backup namenode dir is enough. > > > > I haven't use the hadoop in a production environment. So, I can't tell > you > > what would be right way to reboot the server. > > > > On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <[EMAIL PROTECTED] > > >wrote: > > > > > Hi, > > > > > > On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote: > > > > It seems the exception occurs during NameNode loads the editlog. > > > > make sure the editlog file exists. or you can debug the application > to > > > > see what's wrong. > > > > > > last night I tried to fix the problem and did a big mistake. Instead of > > > copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and > > > edits.new to a backup I moved them and later delete the only version > > > hence I thought I have a copy. > > > > > > The good thing: The namenode starts again. > > > The bad thing: My file system is now in an inconsistent state. > > > > > > Probably the only solution is to reformat the hdfs and start from > > > scratch. Thankfully there wasn't that much data stored at the hdfs > until > > > now but I definitely have to make sure that this doesn't happens again: > > > > > > 1. I have set up a second dfs.name.dir which is stored at another > > > computer (mounted by sshfs) > > > 2. I will install a backup script similar to: > > > http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script > > > > > > Do you think this should be enough to overcome such situations in the > > > future? Any additional ideas how to make it more safe? > > > > > > I'm still a little bit afraid if I think about the next time I will > have > > > to reboot the server. Shouldn't a reboot safely stop and restart all > > > Hadoop services? Is there any thing I can do to make sure that the next > > > reboot will not cause the same problems? > > > > > > Thanks a lot! > > > Björn > > > > > > > > > > > > > > > -- > > -----李平 > > >
-
Re: namenode doesn't start after rebootTodd Lipcon 2010-12-23, 20:02
On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <[EMAIL PROTECTED]>wrote:
> > 1. I have set up a second dfs.name.dir which is stored at another > computer (mounted by sshfs) > I would strongly discourage the use of sshfs for the name dir. For one, it's slow, and for two, I've sen it have some really weird semantics where it's doing write-back caching. Just take a look at its manpage and you should get scared about using it for a critical mount point like this. A soft interruptable NFS mount is a much safer bet. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: namenode doesn't start after rebootJakob Homan 2010-12-23, 20:47
Please move discussions of CDH issues to Cloudera's lists. Thanks.
On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <[EMAIL PROTECTED]>wrote: > >> >> 1. I have set up a second dfs.name.dir which is stored at another >> computer (mounted by sshfs) >> > > I would strongly discourage the use of sshfs for the name dir. For one, it's > slow, and for two, I've sen it have some really weird semantics where it's > doing write-back caching. > > Just take a look at its manpage and you should get scared about using it for > a critical mount point like this. > > A soft interruptable NFS mount is a much safer bet. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: namenode doesn't start after rebootTodd Lipcon 2010-12-23, 21:08
On Thu, Dec 23, 2010 at 12:47 PM, Jakob Homan <[EMAIL PROTECTED]> wrote:
> Please move discussions of CDH issues to Cloudera's lists. Thanks. > Hi Jakob, These bugs are clearly not CDH-specific. NameNode corruption bugs, and best practices with regard to the storage of NN metadata, are clearly applicable to any version of Hadoop that users may run, be it Apache, Yahoo, Facebook, 0.20, 0.21, or trunk. If you have reason to believe my suggestion you quoted below is somehow not relevant to the larger community I would love to hear it. My understanding of the ASF goals is that we should encourage a cohesive community. Asking users of CDH to move general Hadoop questions off of ASF mailing lists just because of their choice in distros encourages a fractured community rather than a cohesive one. Clearly. if a user has a question specifically about Cloudera packaging they should be directed to the CDH lists so as not to clutter non-CDH users' inboxes with irrelevant questions. I think if you browse the archives you'll find that Cloudera employees have been consistent about doing this since we started the cdh-user list several months ago. But if an issue is a bug that is likely to occur in trunk, it makes sense to me to leave it on the list associated with the core project. Personally I do my best to answer questions on the ASF lists regardless of which distro the person is using - though our distros have some divergence in backported patch sets, it's rare that a bug in one distro doesn't allow us to fix a bug in trunk. I can readily pull up several recent examples of this, and I'm surprised that there isn't more concern in the general community about bugs that may result in NN metadata corruption. Thanks, -Todd > > On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > > On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <[EMAIL PROTECTED] > >wrote: > > > >> > >> 1. I have set up a second dfs.name.dir which is stored at another > >> computer (mounted by sshfs) > >> > > > > I would strongly discourage the use of sshfs for the name dir. For one, > it's > > slow, and for two, I've sen it have some really weird semantics where > it's > > doing write-back caching. > > > > Just take a look at its manpage and you should get scared about using it > for > > a critical mount point like this. > > > > A soft interruptable NFS mount is a much safer bet. > > > > -Todd > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: namenode doesn't start after rebootBjoern Schiessle 2010-12-23, 22:05
Hi,
On Thu, 23 Dec 2010 09:15:41 -0800 Aaron T. Myers wrote: > All this aside, you really shouldn't have to "safely" stop all the > Hadoop services when you reboot any of your servers. Hadoop should be > able to survive a crash of any of the daemons. Any circumstance in > which Hadoop currently corrupts the edits log or fsimage is a serious > bug, and should be reported via JIRA. this is also what I would expect. Nevertheless the last reboot caused the problem described at the beginning of the thread. During the tests today I always stopped the Datanode and Namenode by my own which works flawlessly. To be a little bit more safe I wrote my own stop-script which stops the datanode and the namenode before shutdown. best wishes, Björn
-
Re: namenode doesn't start after rebootBjoern Schiessle 2010-12-23, 22:06
On Thu, 23 Dec 2010 12:02:51 -0800 Todd Lipcon wrote:
> On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle > <[EMAIL PROTECTED]>wrote: > > > > > 1. I have set up a second dfs.name.dir which is stored at another > > computer (mounted by sshfs) > > > > I would strongly discourage the use of sshfs for the name dir. For one, > it's slow, and for two, I've sen it have some really weird semantics > where it's doing write-back caching. Thanks for this insights. I now switched to NFS. Thanks, Björn
-
Re: namenode doesn't start after rebootRyan Rawson 2010-12-23, 22:46
I think the bug might be related to this:
https://issues.apache.org/jira/browse/HDFS-686 and https://issues.apache.org/jira/browse/HDFS-1002 On Thu, Dec 23, 2010 at 12:47 PM, Jakob Homan <[EMAIL PROTECTED]> wrote: > Please move discussions of CDH issues to Cloudera's lists. Thanks. > > On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <[EMAIL PROTECTED]>wrote: >> >>> >>> 1. I have set up a second dfs.name.dir which is stored at another >>> computer (mounted by sshfs) >>> >> >> I would strongly discourage the use of sshfs for the name dir. For one, it's >> slow, and for two, I've sen it have some really weird semantics where it's >> doing write-back caching. >> >> Just take a look at its manpage and you should get scared about using it for >> a critical mount point like this. >> >> A soft interruptable NFS mount is a much safer bet. >> >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > |