|
Andrew Nguyen
2010-05-13, 00:19
Jeff Zhang
2010-05-13, 02:40
Andrew Nguyen
2010-05-13, 13:52
Jeff Zhang
2010-05-14, 01:51
Andrew Nguyen
2010-05-14, 15:53
Allen Wittenauer
2010-05-14, 16:17
Andrew Nguyen
2010-05-14, 20:06
Andrew Nguyen
2010-05-15, 00:13
Hemanth Yamijala
2010-05-15, 02:41
Andrew Nguyen
2010-05-15, 06:06
Andrew Nguyen
2010-05-17, 18:31
Andrew Nguyen
2010-05-17, 18:58
|
-
Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-13, 00:19
I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. Any thoughts? Thanks! --Andrew
-
Re: Setting up a second cluster and getting a weird issueJeff Zhang 2010-05-13, 02:40
These 4 nodes share NFS ?
On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen <[EMAIL PROTECTED]> wrote: > I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: > > 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) > at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) > at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) > at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) > at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) > at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) > at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) > > > There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. > > Any thoughts? > > Thanks! > > --Andrew -- Best Regards Jeff Zhang
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-13, 13:52
Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local.
Thanks! --Andrew On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: > These 4 nodes share NFS ? > > > On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen > <[EMAIL PROTECTED]> wrote: >> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: >> >> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >> at java.io.RandomAccessFile.open(Native Method) >> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) >> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) >> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) >> at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) >> at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) >> at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) >> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) >> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) >> >> >> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >> >> Any thoughts? >> >> Thanks! >> >> --Andrew > > > > -- > Best Regards > > Jeff Zhang
-
Re: Setting up a second cluster and getting a weird issueJeff Zhang 2010-05-14, 01:51
It is not suggested to deploy hadoop on NFS, there will be conflict
between data nodes, because NFS share the same namespace of file system. On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <[EMAIL PROTECTED]> wrote: > > Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. > > Thanks! > > --Andrew > > On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: > > > These 4 nodes share NFS ? > > > > > > On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen > > <[EMAIL PROTECTED]> wrote: > >> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: > >> > >> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) > >> at java.io.RandomAccessFile.open(Native Method) > >> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > >> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) > >> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) > >> at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) > >> at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) > >> at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) > >> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) > >> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) > >> > >> > >> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. > >> > >> Any thoughts? > >> > >> Thanks! > >> > >> --Andrew > > > > > > > > -- > > Best Regards > > > > Jeff Zhang > -- Best Regards Jeff Zhang
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-14, 15:53
Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information?
The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... Thanks, Andrew On May 13, 2010, at 6:51 PM, Jeff Zhang wrote: > It is not suggested to deploy hadoop on NFS, there will be conflict > between data nodes, because NFS share the same namespace of file > system. > > > > On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <[EMAIL PROTECTED]> wrote: >> >> Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. >> >> Thanks! >> >> --Andrew >> >> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: >> >>> These 4 nodes share NFS ? >>> >>> >>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen >>> <[EMAIL PROTECTED]> wrote: >>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: >>>> >>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >>>> at java.io.RandomAccessFile.open(Native Method) >>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) >>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) >>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) >>>> at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) >>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) >>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) >>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) >>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) >>>> >>>> >>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>> >>>> Any thoughts? >>>> >>>> Thanks! >>>> >>>> --Andrew >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >> > > > > -- > Best Regards > > Jeff Zhang
-
Re: Setting up a second cluster and getting a weird issueAllen Wittenauer 2010-05-14, 16:17
On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote: > Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? > > The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... >>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >>>>> >>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>> >>>>> Any thoughts? Something is deleting the contents of /srv/hadoop/dfs/1. How did you set your dfs.data.dir in the config file? Or did you just change hadoop.tmp.dir?
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-14, 20:06
I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1
<property> <name>dfs.data.dir</name> <value>/srv/hadoop/dfs/1</value> </property> I don't have hadoop.tmp.dir set to anything so it's whatever the default is. I don't have access to the cluster right now but will update with the exact settings when I get a chance. I have 4 slaves with identical hardware. Each has a separate SCSI drive mounted at /srv/hadooop/dfs/1. The same config file is used across all the slaves. I know the NFS approach isn't ideal for larger deployments but right now, I'm still in the tweaking stage and figured NFS was the fastest way to propagate changes. Thanks! On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote: > > On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote: > >> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? >> >> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... >>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) > >>>>>> >>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>>> >>>>>> Any thoughts? > > Something is deleting the contents of /srv/hadoop/dfs/1. How did you set your dfs.data.dir in the config file? Or did you just change hadoop.tmp.dir? > >
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-15, 00:13
My hdfs-site.xml file:
11 <configuration> 12 <property> 13 <name>dfs.replication</name> 14 <value>3</value> 15 </property> 16 <property> 17 <name>dfs.name.dir</name> 18 <value>/srv/hadoop/dfs.name.dir</value> 19 </property> 20 <property> 21 <name>dfs.data.dir</name> 22 <value>/srv/hadoop/dfs/1</value> 23 </property> 24 </configuration> Here is my /srv/hadoop/hadoop directory listing: total 5068 drwxr-xr-x 2 hadoop hadoop 4096 2010-05-12 16:10 bin -rw-rw-r-- 1 hadoop hadoop 73847 2010-03-21 23:17 build.xml drwxr-xr-x 5 hadoop hadoop 4096 2010-03-21 23:17 c++ -rw-rw-r-- 1 hadoop hadoop 348624 2010-03-21 23:17 CHANGES.txt drwxr-xr-x 4 hadoop hadoop 4096 2010-05-12 09:29 cloudera lrwxrwxrwx 1 hadoop hadoop 15 2010-05-12 15:54 conf -> ../hadoop-conf/ drwxr-xr-x 15 hadoop hadoop 4096 2010-03-21 23:17 contrib drwxr-xr-x 9 hadoop hadoop 4096 2010-05-12 09:29 docs drwxr-xr-x 3 hadoop hadoop 4096 2010-03-21 23:17 example-confs -rw-rw-r-- 1 hadoop hadoop 6839 2010-03-21 23:17 hadoop-0.20.2+228-ant.jar -rw-rw-r-- 1 hadoop hadoop 2806445 2010-03-21 23:17 hadoop-0.20.2+228-core.jar -rw-rw-r-- 1 hadoop hadoop 142466 2010-03-21 23:17 hadoop-0.20.2+228-examples.jar -rw-rw-r-- 1 hadoop hadoop 1637240 2010-03-21 23:17 hadoop-0.20.2+228-test.jar -rw-rw-r-- 1 hadoop hadoop 70090 2010-03-21 23:17 hadoop-0.20.2+228-tools.jar drwxr-xr-x 2 hadoop hadoop 4096 2010-05-12 09:29 ivy -rw-rw-r-- 1 hadoop hadoop 9103 2010-03-21 23:17 ivy.xml drwxr-xr-x 5 hadoop hadoop 4096 2010-05-12 09:29 lib -rw-rw-r-- 1 hadoop hadoop 13366 2010-03-21 23:17 LICENSE.txt lrwxrwxrwx 1 hadoop hadoop 8 2010-05-12 16:28 logs -> ../logs/ drwxr-xr-x 3 hadoop hadoop 4096 2010-05-12 16:16 logs-old -rw-rw-r-- 1 hadoop hadoop 101 2010-03-21 23:17 NOTICE.txt lrwxrwxrwx 1 hadoop hadoop 7 2010-05-12 16:28 pids -> ../pids drwxr-xr-x 2 hadoop hadoop 4096 2010-05-12 16:10 pids-old -rw-rw-r-- 1 hadoop hadoop 1366 2010-03-21 23:17 README.txt drwxr-xr-x 15 hadoop hadoop 4096 2010-05-12 09:29 src drwxr-xr-x 8 hadoop hadoop 4096 2010-03-21 23:17 webapps The only NFS shared directories are /srv/hadoop/hadoop and /srv/hadoop/hadoop-conf On May 14, 2010, at 1:06 PM, Andrew Nguyen wrote: > I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1 > > <property> > <name>dfs.data.dir</name> > <value>/srv/hadoop/dfs/1</value> > </property> > > I don't have hadoop.tmp.dir set to anything so it's whatever the default is. > > I don't have access to the cluster right now but will update with the exact settings when I get a chance. > > I have 4 slaves with identical hardware. Each has a separate SCSI drive mounted at /srv/hadooop/dfs/1. The same config file is used across all the slaves. I know the NFS approach isn't ideal for larger deployments but right now, I'm still in the tweaking stage and figured NFS was the fastest way to propagate changes. > > Thanks! > > On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote: > >> >> On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote: >> >>> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? >>> >>> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... >>>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >> >>>>>>> >>>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>>>> >>>>>>> Any thoughts? >> >> Something is deleting the contents of /srv/hadoop/dfs/1. How did you set your dfs.data.dir in the config file? Or did you just change hadoop.tmp.dir? >> >> >
-
Re: Setting up a second cluster and getting a weird issueHemanth Yamijala 2010-05-15, 02:41
Andrew,
> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? FWIW, we had an experience where we were storing config files on NFS on a large cluster. Randomly, (and we guess due to NFS problems), Hadoop would fail picking up the config files on NFS and instead use its defaults. The config values for some directory paths defined in default being different from the actual config values was resulting in very odd errors. We were able to eventually solve the problem by moving the config files off NFS. Of course, the size of the cluster (several hundreds of slaves) was probably a reason. But nevertheless, you may want to try pulling everything off NFS. Thanks Hemanth > > The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... > > Thanks, > Andrew > > On May 13, 2010, at 6:51 PM, Jeff Zhang wrote: > >> It is not suggested to deploy hadoop on NFS, there will be conflict >> between data nodes, because NFS share the same namespace of file >> system. >> >> >> >> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <[EMAIL PROTECTED]> wrote: >>> >>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. >>> >>> Thanks! >>> >>> --Andrew >>> >>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: >>> >>>> These 4 nodes share NFS ? >>>> >>>> >>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen >>>> <[EMAIL PROTECTED]> wrote: >>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: >>>>> >>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >>>>> at java.io.RandomAccessFile.open(Native Method) >>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) >>>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) >>>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) >>>>> at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) >>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) >>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) >>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) >>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) >>>>> >>>>> >>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>> >>>>> Any thoughts? >>>>> >>>>> Thanks! >>>>> >>>>> --Andrew >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang > >
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-15, 06:06
Yeah, I tried some more experiments today and the error messages were more helpful. It does seem that some of the values were defaulting to ones very different from what I had configured.
I have been looking into Puppet but figured with 4 slaves, it shouldn't be a problem to use NFS. Guess I was wrong! Thanks all, Andrew On May 14, 2010, at 7:41 PM, Hemanth Yamijala wrote: > Andrew, > >> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? > > FWIW, we had an experience where we were storing config files on NFS > on a large cluster. Randomly, (and we guess due to NFS problems), > Hadoop would fail picking up the config files on NFS and instead use > its defaults. The config values for some directory paths defined in > default being different from the actual config values was resulting in > very odd errors. We were able to eventually solve the problem by > moving the config files off NFS. Of course, the size of the cluster > (several hundreds of slaves) was probably a reason. But nevertheless, > you may want to try pulling everything off NFS. > > Thanks > Hemanth > >> >> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... >> >> Thanks, >> Andrew >> >> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote: >> >>> It is not suggested to deploy hadoop on NFS, there will be conflict >>> between data nodes, because NFS share the same namespace of file >>> system. >>> >>> >>> >>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <[EMAIL PROTECTED]> wrote: >>>> >>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. >>>> >>>> Thanks! >>>> >>>> --Andrew >>>> >>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: >>>> >>>>> These 4 nodes share NFS ? >>>>> >>>>> >>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: >>>>>> >>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >>>>>> at java.io.RandomAccessFile.open(Native Method) >>>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) >>>>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) >>>>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) >>>>>> at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) >>>>>> >>>>>> >>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> --Andrew >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >> >>
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-17, 18:31
So I pulled everything of NFS and I'm still getting the original error with a FileNotFoundException for current/VERSION.
I only have 4 slaves and scp'ed the Hadoop directory to all 4 slaves. Any other ideas? On May 14, 2010, at 7:41 PM, Hemanth Yamijala wrote: > Andrew, > >> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS. I don't see how this would cause a conflict - do you have any additional information? > > FWIW, we had an experience where we were storing config files on NFS > on a large cluster. Randomly, (and we guess due to NFS problems), > Hadoop would fail picking up the config files on NFS and instead use > its defaults. The config values for some directory paths defined in > default being different from the actual config values was resulting in > very odd errors. We were able to eventually solve the problem by > moving the config files off NFS. Of course, the size of the cluster > (several hundreds of slaves) was probably a reason. But nevertheless, > you may want to try pulling everything off NFS. > > Thanks > Hemanth > >> >> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS... >> >> Thanks, >> Andrew >> >> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote: >> >>> It is not suggested to deploy hadoop on NFS, there will be conflict >>> between data nodes, because NFS share the same namespace of file >>> system. >>> >>> >>> >>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <[EMAIL PROTECTED]> wrote: >>>> >>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. >>>> >>>> Thanks! >>>> >>>> --Andrew >>>> >>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: >>>> >>>>> These 4 nodes share NFS ? >>>>> >>>>> >>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: >>>>>> >>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) >>>>>> at java.io.RandomAccessFile.open(Native Method) >>>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) >>>>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) >>>>>> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) >>>>>> at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) >>>>>> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) >>>>>> >>>>>> >>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> --Andrew >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >> >>
-
Re: Setting up a second cluster and getting a weird issueAndrew Nguyen 2010-05-17, 18:58
Sorry for bothering everyone, I accidentally configured my dfs.data.dir and mapred.local.dir to the same directory... Bad copy/paste job.
Thanks for everyone's help! |