|
Brian Wolf
2010-01-30, 08:27
Aaron Kimball
2010-01-31, 21:54
Brian Wolf
2010-02-01, 21:59
Alex Kozlov
2010-02-02, 04:19
Brian Wolf
2010-02-03, 22:17
Ed Mazur
2010-02-03, 23:07
Alex Kozlov
2010-02-04, 02:30
Brian Wolf
2010-02-04, 02:49
Brian Wolf
2010-02-04, 04:26
Alex Kozlov
2010-02-04, 05:12
Brian Wolf
2010-03-13, 01:20
Brian Wolf
2010-03-13, 19:34
Alex Kozlov
2010-03-13, 20:44
Brian Wolf
2010-03-18, 00:37
|
-
hadoop under cygwin issueBrian Wolf 2010-01-30, 08:27
Hi, I am trying to run Hadoop 0.19.2 under cygwin as per directions on the hadoop "quickstart" web page. I know sshd is running and I can "ssh localhost" without a password. This is from my hadoop-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/cygwin/tmp/hadoop-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.job.reuse.jvm.num.tasks</name> <value>-1</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>webinterface.private.actions</name> <value>true</value> </property> </configuration> These are errors from my log files: 2010-01-30 00:03:33,091 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2010-01-30 00:03:33,121 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:9000 2010-01-30 00:03:33,161 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-01-30 00:03:33,181 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-01-30 00:03:34,603 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=brian,None,Administrators,Users 2010-01-30 00:03:34,603 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-01-30 00:03:34,603 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2010-01-30 00:03:34,653 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-01-30 00:03:34,653 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-01-30 00:03:34,803 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory C:\cygwin\tmp\hadoop-brian\dfs\name does not exist. 2010-01-30 00:03:34,813 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory C:\cygwin\tmp\hadoop-brian\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:278) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:288) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:208) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:194) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) 2010-01-30 00:03:34,823 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 ======================================================== 2010-01-29 15:13:30,270 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s). problem cleaning system directory: null java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information at org.apache.hadoop.ipc.Client.wrapException(Client.java:724) at org.apache.hadoop.ipc.Client.call(Client.java:700) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104) Thanks Brian
-
Re: hadoop under cygwin issueAaron Kimball 2010-01-31, 21:54
Brian, it looks like you missed a step in the instructions. You'll need to
format the hdfs filesystem instance before starting the NameNode server: You need to run: $ bin/hadoop namenode -format .. then you can do bin/start-dfs.sh Hope this helps, - Aaron On Sat, Jan 30, 2010 at 12:27 AM, Brian Wolf <[EMAIL PROTECTED]> wrote: > > Hi, > > I am trying to run Hadoop 0.19.2 under cygwin as per directions on the > hadoop "quickstart" web page. > > I know sshd is running and I can "ssh localhost" without a password. > > This is from my hadoop-site.xml > > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/cygwin/tmp/hadoop-${user.name}</value> > </property> > <property> > <name>fs.default.name</name> > <value>hdfs://localhost:9000</value> > </property> > <property> > <name>mapred.job.tracker</name> > <value>localhost:9001</value> > </property> > <property> > <name>mapred.job.reuse.jvm.num.tasks</name> > <value>-1</value> > </property> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > <property> > <name>dfs.permissions</name> > <value>false</value> > </property> > <property> > <name>webinterface.private.actions</name> > <value>true</value> > </property> > </configuration> > > These are errors from my log files: > > > 2010-01-30 00:03:33,091 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=NameNode, port=9000 > 2010-01-30 00:03:33,121 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/ > 127.0.0.1:9000 > 2010-01-30 00:03:33,161 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=NameNode, sessionId=null > 2010-01-30 00:03:33,181 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing > NameNodeMeterics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2010-01-30 00:03:34,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > fsOwner=brian,None,Administrators,Users > 2010-01-30 00:03:34,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup > 2010-01-30 00:03:34,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isPermissionEnabled=false > 2010-01-30 00:03:34,653 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: > Initializing FSNamesystemMetrics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2010-01-30 00:03:34,653 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered > FSNamesystemStatusMBean > 2010-01-30 00:03:34,803 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory C:\cygwin\tmp\hadoop-brian\dfs\name does not exist. > 2010-01-30 00:03:34,813 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed. > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: > Directory C:\cygwin\tmp\hadoop-brian\dfs\name is in an inconsistent state: > storage directory does not exist or is not accessible. > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:278) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:288) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:208) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:194) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) > 2010-01-30 00:03:34,823 INFO org.apache.hadoop.ipc.Server: Stopping server > on 9000 > > > > > > ========================================================> > 2010-01-29 15:13:30,270 INFO org.apache.hadoop.ipc.Client: Retrying connect
-
Re: hadoop under cygwin issueBrian Wolf 2010-02-01, 21:59
Aaron, Thanks or your help. I carefully went through the steps again a couple times , and ran after this bin/hadoop namenode -format (by the way, it asks if I want to reformat, I've tried it both ways) then bin/start-dfs.sh and bin/start-all.sh and then bin/hadoop fs -put conf input now the return for this seemed cryptic: put: Target input/conf is a directory (??) and when I tried bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' It says something about 0 nodes (from log file) 2010-02-01 13:26:29,874 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar dst=null perm=brian:supergroup:rw-r--r-- 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000, call addBlock(/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar, DFSClient_725490811) from 127.0.0.1:3003: error: java.io.IOException: File /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar could only be replicated to 0 nodes, instead of 1 java.io.IOException: File /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1287) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) To maybe rule out something regarding ports or ssh , when I run netstat: TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING and when I browse to http://localhost:50070/ Cluster Summary * * * 21 files and directories, 0 blocks = 21 total. Heap Size is 8.01 MB / 992.31 MB (0%) * Configured Capacity : 0 KB DFS Used : 0 KB Non DFS Used : 0 KB DFS Remaining : 0 KB DFS Used% : 100 % DFS Remaining% : 0 % Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 Dead Nodes <http://localhost:50070/dfshealth.jsp#DeadNodes> : 0 so I'm a bit still in the dark, I guess. Thanks Brian Aaron Kimball wrote: > Brian, it looks like you missed a step in the instructions. You'll need to > format the hdfs filesystem instance before starting the NameNode server: > > You need to run: > > $ bin/hadoop namenode -format > > .. then you can do bin/start-dfs.sh > Hope this helps, > - Aaron > > > On Sat, Jan 30, 2010 at 12:27 AM, Brian Wolf <[EMAIL PROTECTED]> wrote: > > >> Hi, >> >> I am trying to run Hadoop 0.19.2 under cygwin as per directions on the >> hadoop "quickstart" web page. >> >> I know sshd is running and I can "ssh localhost" without a password. >> >> This is from my hadoop-site.xml >> >> <configuration> >> <property> >> <name>hadoop.tmp.dir</name> >> <value>/cygwin/tmp/hadoop-${user.name}</value> >> </property> >> <property> >> <name>fs.default.name</name> >> <value>hdfs://localhost:9000</value> >> </property> >> <property> >> <name>mapred.job.tracker</name> >> <value>localhost:9001</value> >> </property> >> <property> >> <name>mapred.job.reuse.jvm.num.tasks</name> >> <value>-1</value> >> </property> >> <property> >> <name>dfs.replication</name> >> <value>1</value> >> </property> >> <property> >> <name>dfs.permissions</name> >> <value>false</value> >> </property> >> <property> >> <name>webinterface.private.actions</name> >> <value>true</value> >> </property> >> </configuration> >> >> These are errors from my log files: >> >> >> 2010-01-30 00:03:33,091 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: >> Initializing RPC Metrics with hostName=NameNode, port=9000 >> 2010-01-30 00:03:33,121 INFO
-
Re: hadoop under cygwin issueAlex Kozlov 2010-02-02, 04:19
Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0
You datanode is dead. Look at the logs in the $HADOOP_HOME/logs directory (or where your logs are) and check the errors. Alex K On Mon, Feb 1, 2010 at 1:59 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > > Aaron, > > Thanks or your help. I carefully went through the steps again a couple > times , and ran > > after this > bin/hadoop namenode -format > > (by the way, it asks if I want to reformat, I've tried it both ways) > > > then > > > bin/start-dfs.sh > > and > > bin/start-all.sh > > > and then > bin/hadoop fs -put conf input > > now the return for this seemed cryptic: > > > put: Target input/conf is a directory > > (??) > > and when I tried > > bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' > > It says something about 0 nodes > > (from log file) > > 2010-02-01 13:26:29,874 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create > src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar > dst=null perm=brian:supergroup:rw-r--r-- > 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 3 on 9000, call > addBlock(/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar, > DFSClient_725490811) from 127.0.0.1:3003: error: java.io.IOException: File > /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar could > only be replicated to 0 nodes, instead of 1 > java.io.IOException: File > /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar could > only be replicated to 0 nodes, instead of 1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1287) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > > To maybe rule out something regarding ports or ssh , when I run netstat: > > TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING > TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING > > > and when I browse to http://localhost:50070/ > > > Cluster Summary > > * * * 21 files and directories, 0 blocks = 21 total. Heap Size is 8.01 MB / > 992.31 MB (0%) > * > Configured Capacity : 0 KB > DFS Used : 0 KB > Non DFS Used : 0 KB > DFS Remaining : 0 KB > DFS Used% : 100 % > DFS Remaining% : 0 % > Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 > Dead Nodes <http://localhost:50070/dfshealth.jsp#DeadNodes> : 0 > > > so I'm a bit still in the dark, I guess. > > Thanks > Brian > > > > > Aaron Kimball wrote: > >> Brian, it looks like you missed a step in the instructions. You'll need to >> format the hdfs filesystem instance before starting the NameNode server: >> >> You need to run: >> >> $ bin/hadoop namenode -format >> >> .. then you can do bin/start-dfs.sh >> Hope this helps, >> - Aaron >> >> >> On Sat, Jan 30, 2010 at 12:27 AM, Brian Wolf <[EMAIL PROTECTED]> wrote: >> >> >> >>> Hi, >>> >>> I am trying to run Hadoop 0.19.2 under cygwin as per directions on the >>> hadoop "quickstart" web page. >>> >>> I know sshd is running and I can "ssh localhost" without a password. >>> >>> This is from my hadoop-site.xml >>> >>> <configuration> >>> <property> >>> <name>hadoop.tmp.dir</name> >>> <value>/cygwin/tmp/hadoop-${user.name}</value> >>> </property> >>> <property> >>> <name>fs.default.name</name> >>> <value>hdfs://localhost:9000</value> >>> </property> >>> <property> >>> <name>mapred.job.tracker</name> >>> <value>localhost:9001</value> >>> </property> >>> <property> >>> <name>mapred.job.reuse.jvm.num.tasks</name> >>> <value>-1</value> >>> </property>
-
Re: hadoop under cygwin issueBrian Wolf 2010-02-03, 22:17
Alex Kozlov wrote:
> Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 > > You datanode is dead. Look at the logs in the $HADOOP_HOME/logs directory > (or where your logs are) and check the errors. > > Alex K > > On Mon, Feb 1, 2010 at 1:59 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > > Thanks for your help, Alex, I managed to get past that problem, now I have this problem: However, when I try to run this example as stated on the quickstart webpage: bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' I get this error; ============================================================java.io.IOException: Not a file: hdfs://localhost:9000/user/brian/input/conf ========================================================so it seems to default to my home directory looking for "input" it apparently needs an absolute filepath, however, when I run that way: $ bin/hadoop jar hadoop-*-examples.jar grep /usr/local/hadoop-0.19.2/input output 'dfs[a-z.]+' =============================================================org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/usr/local/hadoop-0.19.2/input =============================================================It still isn't happy although this part -> /usr/local/hadoop-0.19.2/input <- does exist >> Aaron, >> >> Thanks or your help. I carefully went through the steps again a couple >> times , and ran >> >> after this >> bin/hadoop namenode -format >> >> (by the way, it asks if I want to reformat, I've tried it both ways) >> >> >> then >> >> >> bin/start-dfs.sh >> >> and >> >> bin/start-all.sh >> >> >> and then >> bin/hadoop fs -put conf input >> >> now the return for this seemed cryptic: >> >> >> put: Target input/conf is a directory >> >> (??) >> >> and when I tried >> >> bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' >> >> It says something about 0 nodes >> >> (from log file) >> >> 2010-02-01 13:26:29,874 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >> src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >> dst=null perm=brian:supergroup:rw-r--r-- >> 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 3 on 9000, call >> addBlock(/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar, >> DFSClient_725490811) from 127.0.0.1:3003: error: java.io.IOException: File >> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar could >> only be replicated to 0 nodes, instead of 1 >> java.io.IOException: File >> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar could >> only be replicated to 0 nodes, instead of 1 >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1287) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >> >> >> >> To maybe rule out something regarding ports or ssh , when I run netstat: >> >> TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING >> TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING >> >> >> and when I browse to http://localhost:50070/ >> >> >> Cluster Summary >> >> * * * 21 files and directories, 0 blocks = 21 total. Heap Size is 8.01 MB / >> 992.31 MB (0%) >> * >> Configured Capacity : 0 KB >> DFS Used : 0 KB >> Non DFS Used : 0 KB >> DFS Remaining : 0 KB >> DFS Used% : 100 % >> DFS Remaining% : 0 % >> Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 >> Dead Nodes <http://localhost:50070/dfshealth.jsp#DeadNodes> : 0
-
Re: hadoop under cygwin issueEd Mazur 2010-02-03, 23:07
Brian,
It looks like you're confusing your local file system with HDFS. HDFS sits on top of your file system and is where data for (non-standalone) Hadoop jobs comes from. You can poll it with "fs -ls ...", so do something like "hadoop fs -lsr /" to see everything in HDFS. This will probably shed some light on why your first attempt failed. /user/brian/input should be a directory with several xml files. Ed On Wed, Feb 3, 2010 at 5:17 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > Alex Kozlov wrote: >> >> Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 >> >> You datanode is dead. Look at the logs in the $HADOOP_HOME/logs directory >> (or where your logs are) and check the errors. >> >> Alex K >> >> On Mon, Feb 1, 2010 at 1:59 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: >> >> > > > > Thanks for your help, Alex, > > I managed to get past that problem, now I have this problem: > > However, when I try to run this example as stated on the quickstart webpage: > > bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' > > I get this error; > ============================================================> java.io.IOException: Not a file: > hdfs://localhost:9000/user/brian/input/conf > ========================================================> so it seems to default to my home directory looking for "input" it > apparently needs an absolute filepath, however, when I run that way: > > $ bin/hadoop jar hadoop-*-examples.jar grep /usr/local/hadoop-0.19.2/input > output 'dfs[a-z.]+' > > =============================================================> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://localhost:9000/usr/local/hadoop-0.19.2/input > =============================================================> It still isn't happy although this part -> /usr/local/hadoop-0.19.2/input > <- does exist >>> >>> Aaron, >>> >>> Thanks or your help. I carefully went through the steps again a couple >>> times , and ran >>> >>> after this >>> bin/hadoop namenode -format >>> >>> (by the way, it asks if I want to reformat, I've tried it both ways) >>> >>> >>> then >>> >>> >>> bin/start-dfs.sh >>> >>> and >>> >>> bin/start-all.sh >>> >>> >>> and then >>> bin/hadoop fs -put conf input >>> >>> now the return for this seemed cryptic: >>> >>> >>> put: Target input/conf is a directory >>> >>> (??) >>> >>> and when I tried >>> >>> bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' >>> >>> It says something about 0 nodes >>> >>> (from log file) >>> >>> 2010-02-01 13:26:29,874 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >>> >>> src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>> dst=null perm=brian:supergroup:rw-r--r-- >>> 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 3 on 9000, call >>> >>> addBlock(/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar, >>> DFSClient_725490811) from 127.0.0.1:3003: error: java.io.IOException: >>> File >>> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>> could >>> only be replicated to 0 nodes, instead of 1 >>> java.io.IOException: File >>> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>> could >>> only be replicated to 0 nodes, instead of 1 >>> at >>> >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1287) >>> at >>> >>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> >>> >>> >>> >>> To maybe rule out something regarding ports or ssh , when I run netstat: >>> >>> TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING
-
Re: hadoop under cygwin issueAlex Kozlov 2010-02-04, 02:30
Try
$ bin/hadoop jar hadoop-*-examples.jar grep file:///usr/local/hadoop-0.19.2/input output 'dfs[a-z.]+' file:/// is a magical prefix to force hadoop to look for the file in the local FS You can also force it to look into local FS by giving '-fs local' or '-fs file:///' option to the hadoop executable These options basically overwrite the *fs.default.name* configuration setting, which should be in your core-site.xml file You can also copy the content of the input directory to HDFS by executing $ bin/hadoop fs -mkdir input $ bin/hadoop fs -copyFromLocal input/* input Hope this helps Alex K On Wed, Feb 3, 2010 at 2:17 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > Alex Kozlov wrote: > >> Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 >> >> You datanode is dead. Look at the logs in the $HADOOP_HOME/logs directory >> (or where your logs are) and check the errors. >> >> Alex K >> >> On Mon, Feb 1, 2010 at 1:59 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: >> >> >> > > > > Thanks for your help, Alex, > > I managed to get past that problem, now I have this problem: > > However, when I try to run this example as stated on the quickstart > webpage: > > > bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' > > I get this error; > ============================================================> java.io.IOException: Not a file: > hdfs://localhost:9000/user/brian/input/conf > ========================================================> so it seems to default to my home directory looking for "input" it > apparently needs an absolute filepath, however, when I run that way: > > $ bin/hadoop jar hadoop-*-examples.jar grep /usr/local/hadoop-0.19.2/input > output 'dfs[a-z.]+' > > =============================================================> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://localhost:9000/usr/local/hadoop-0.19.2/input > =============================================================> It still isn't happy although this part -> /usr/local/hadoop-0.19.2/input > <- does exist > > Aaron, >>> >>> Thanks or your help. I carefully went through the steps again a couple >>> times , and ran >>> >>> after this >>> bin/hadoop namenode -format >>> >>> (by the way, it asks if I want to reformat, I've tried it both ways) >>> >>> >>> then >>> >>> >>> bin/start-dfs.sh >>> >>> and >>> >>> bin/start-all.sh >>> >>> >>> and then >>> bin/hadoop fs -put conf input >>> >>> now the return for this seemed cryptic: >>> >>> >>> put: Target input/conf is a directory >>> >>> (??) >>> >>> and when I tried >>> >>> bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' >>> >>> It says something about 0 nodes >>> >>> (from log file) >>> >>> 2010-02-01 13:26:29,874 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >>> >>> src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>> dst=null perm=brian:supergroup:rw-r--r-- >>> 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 3 on 9000, call >>> >>> addBlock(/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar, >>> DFSClient_725490811) from 127.0.0.1:3003: error: java.io.IOException: >>> File >>> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>> could >>> only be replicated to 0 nodes, instead of 1 >>> java.io.IOException: File >>> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>> could >>> only be replicated to 0 nodes, instead of 1 >>> at >>> >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1287) >>> at >>> >>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-
Re: hadoop under cygwin issueBrian Wolf 2010-02-04, 02:49
Thanks for the insight, Ed. Thats actually a pretty big "gesalt" for
me, I have to process it a bit (I had read about it, of course) Brian Ed Mazur wrote: > Brian, > > It looks like you're confusing your local file system with HDFS. HDFS > sits on top of your file system and is where data for (non-standalone) > Hadoop jobs comes from. You can poll it with "fs -ls ...", so do > something like "hadoop fs -lsr /" to see everything in HDFS. This will > probably shed some light on why your first attempt failed. > /user/brian/input should be a directory with several xml files. > > Ed > > On Wed, Feb 3, 2010 at 5:17 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > >> Alex Kozlov wrote: >> >>> Live Nodes <http://localhost:50070/dfshealth.jsp#LiveNodes> : 0 >>> >>> You datanode is dead. Look at the logs in the $HADOOP_HOME/logs directory >>> (or where your logs are) and check the errors. >>> >>> Alex K >>> >>> On Mon, Feb 1, 2010 at 1:59 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: >>> >>> >>> >> >> Thanks for your help, Alex, >> >> I managed to get past that problem, now I have this problem: >> >> However, when I try to run this example as stated on the quickstart webpage: >> >> bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' >> >> I get this error; >> ============================================================>> java.io.IOException: Not a file: >> hdfs://localhost:9000/user/brian/input/conf >> ========================================================>> so it seems to default to my home directory looking for "input" it >> apparently needs an absolute filepath, however, when I run that way: >> >> $ bin/hadoop jar hadoop-*-examples.jar grep /usr/local/hadoop-0.19.2/input >> output 'dfs[a-z.]+' >> >> =============================================================>> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: >> hdfs://localhost:9000/usr/local/hadoop-0.19.2/input >> =============================================================>> It still isn't happy although this part -> /usr/local/hadoop-0.19.2/input >> <- does exist >> >>>> Aaron, >>>> >>>> Thanks or your help. I carefully went through the steps again a couple >>>> times , and ran >>>> >>>> after this >>>> bin/hadoop namenode -format >>>> >>>> (by the way, it asks if I want to reformat, I've tried it both ways) >>>> >>>> >>>> then >>>> >>>> >>>> bin/start-dfs.sh >>>> >>>> and >>>> >>>> bin/start-all.sh >>>> >>>> >>>> and then >>>> bin/hadoop fs -put conf input >>>> >>>> now the return for this seemed cryptic: >>>> >>>> >>>> put: Target input/conf is a directory >>>> >>>> (??) >>>> >>>> and when I tried >>>> >>>> bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' >>>> >>>> It says something about 0 nodes >>>> >>>> (from log file) >>>> >>>> 2010-02-01 13:26:29,874 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >>>> >>>> src=/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>>> dst=null perm=brian:supergroup:rw-r--r-- >>>> 2010-02-01 13:26:30,045 INFO org.apache.hadoop.ipc.Server: IPC Server >>>> handler 3 on 9000, call >>>> >>>> addBlock(/cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar, >>>> DFSClient_725490811) from 127.0.0.1:3003: error: java.io.IOException: >>>> File >>>> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>>> could >>>> only be replicated to 0 nodes, instead of 1 >>>> java.io.IOException: File >>>> /cygwin/tmp/hadoop-SYSTEM/mapred/system/job_201002011323_0001/job.jar >>>> could >>>> only be replicated to 0 nodes, instead of 1 >>>> at >>>> >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1287) >>>> at >>>> >>>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at
-
Re: hadoop under cygwin issueBrian Wolf 2010-02-04, 04:26
Alex, thanks for the help, it seems to start now, however
$ bin/hadoop jar hadoop-*-examples.jar grep -fs local input output 'dfs[a-z.]+' 10/02/03 20:02:41 WARN fs.FileSystem: "local" is a deprecated filesystem name. Use "file:///" instead. 10/02/03 20:02:43 INFO mapred.FileInputFormat: Total input paths to process : 3 10/02/03 20:02:44 INFO mapred.JobClient: Running job: job_201002031354_0013 10/02/03 20:02:45 INFO mapred.JobClient: map 0% reduce 0% it hangs here (is pseudo cluster supposed to work?) these are bottom of various log files conf log file <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property> <property><name>mapred.input.dir</name><value>file:/C:/OpenSSH/usr/local/hadoop-0.19.2/input</value></property> <property><name>mapred.job.tracker.http.address</name><value>0.0.0.0:50030</value></property> <property><name>io.file.buffer.size</name><value>4096</value></property> <property><name>mapred.jobtracker.restart.recover</name><value>false</value></property> <property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization</value></property> <property><name>dfs.datanode.handler.count</name><value>3</value></property> <property><name>mapred.reduce.copy.backoff</name><value>300</value></property> <property><name>mapred.task.profile</name><value>false</value></property> <property><name>dfs.replication.considerLoad</name><value>true</value></property> <property><name>jobclient.output.filter</name><value>FAILED</value></property> <property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property> <property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property> <property><name>fs.checkpoint.size</name><value>67108864</value></property> bottom namenode log added to blk_6520091160827873550_1036 size 570 2010-02-03 20:02:43,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml dst=null perm=brian:supergroup:rw-r--r-- 2010-02-03 20:02:43,866 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=setPermission src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml dst=null perm=brian:supergroup:rw-r--r-- 2010-02-03 20:02:44,026 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml. blk_517844159758473296_1037 2010-02-03 20:02:44,076 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_517844159758473296_1037 size 16238 2010-02-03 20:02:44,257 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml dst=null perm=null 2010-02-03 20:02:44,527 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.jar dst=null perm=null 2010-02-03 20:02:45,258 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.split dst=null perm=null bottom datanode log 2010-02-03 20:02:44,046 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_517844159758473296_1037 src: /127.0.0.1:4069 dest: /127.0.0.1:50010 2010-02-03 20:02:44,076 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:4069, dest: /127.0.0.1:50010, bytes: 16238, op: HDFS_WRITE, cliID: DFSClient_-1424524646, srvID: DS-1812377383-192.168.1.5-50010-1265088397104, blockid: blk_517844159758473296_1037 2010-02-03 20:02:44,086 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_517844159758473296_1037 terminating 2010-02-03 20:02:44,457 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:4075, bytes: 16366, op: HDFS_READ, cliID: DFSClient_-548531246, srvID: DS-1812377383-192.168.1.5-50010-1265088397104, blockid: blk_517844159758473296_1037 2010-02-03 20:02:44,677 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:4076, bytes: 135168, op: HDFS_READ, cliID: DFSClient_-548531246, srvID: DS-1812377383-192.168.1.5-50010-1265088397104, blockid: blk_-2806977820057440405_1035 2010-02-03 20:02:45,278 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:4077, bytes: 578, op: HDFS_READ, cliID: DFSClient_-548531246, srvID: DS-1812377383-192.168.1.5-50010-1265088397104, blockid: blk_6520091160827873550_1036 2010-02-03 20:04:10,451 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_3301977249866081256_1031 2010-02-03 20:09:35,658 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_9116729021606317943_1025 2010-02-03 20:09:44,671 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_8602436668984954947_1026 jobtracker log Input size for job job_201002031354_0012 = 53060 2010-02-03 19:48:37,599 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_201002031354_0012 2010-02-03 19:48:37,649 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/localhost 2010-02-03 19:48:37,659 INFO org.apache.hadoop.mapred.JobInProg
-
Re: hadoop under cygwin issueAlex Kozlov 2010-02-04, 05:12
Can you endeavor a simpler job (just to make sure your setup works):
$ hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar pi 2 2 Alex K On Wed, Feb 3, 2010 at 8:26 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > Alex, thanks for the help, it seems to start now, however > > > $ bin/hadoop jar hadoop-*-examples.jar grep -fs local input output > 'dfs[a-z.]+' > 10/02/03 20:02:41 WARN fs.FileSystem: "local" is a deprecated filesystem > name. Use "file:///" instead. > 10/02/03 20:02:43 INFO mapred.FileInputFormat: Total input paths to process > : 3 > 10/02/03 20:02:44 INFO mapred.JobClient: Running job: job_201002031354_0013 > 10/02/03 20:02:45 INFO mapred.JobClient: map 0% reduce 0% > > > > it hangs here (is pseudo cluster supposed to work?) > > > these are bottom of various log files > > conf log file > > > <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property> > > <property><name>mapred.input.dir</name><value>file:/C:/OpenSSH/usr/local/hadoop-0.19.2/input</value></property> > <property><name>mapred.job.tracker.http.address</name><value>0.0.0.0:50030 > </value></property> > <property><name>io.file.buffer.size</name><value>4096</value></property> > > <property><name>mapred.jobtracker.restart.recover</name><value>false</value></property> > > <property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization</value></property> > > <property><name>dfs.datanode.handler.count</name><value>3</value></property> > > <property><name>mapred.reduce.copy.backoff</name><value>300</value></property> > <property><name>mapred.task.profile</name><value>false</value></property> > > <property><name>dfs.replication.considerLoad</name><value>true</value></property> > > <property><name>jobclient.output.filter</name><value>FAILED</value></property> > > <property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property> > > <property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property> > <property><name>fs.checkpoint.size</name><value>67108864</value></property> > > > bottom > namenode log > > added to blk_6520091160827873550_1036 size 570 > 2010-02-03 20:02:43,826 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create > src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml > dst=null perm=brian:supergroup:rw-r--r-- > 2010-02-03 20:02:43,866 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=setPermission > src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml > dst=null perm=brian:supergroup:rw-r--r-- > 2010-02-03 20:02:44,026 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml. > blk_517844159758473296_1037 > 2010-02-03 20:02:44,076 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to > blk_517844159758473296_1037 size 16238 > 2010-02-03 20:02:44,257 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open > src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml > dst=null perm=null > 2010-02-03 20:02:44,527 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open > src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.jar > dst=null perm=null > 2010-02-03 20:02:45,258 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open > src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.split
-
Re: hadoop under cygwin issueBrian Wolf 2010-03-13, 01:20
Hi Alex,
I am back on this problem. Seems it works, but I have this issue with connecting to server. I can connect 'ssh localhost' ok. Thanks Brian $ bin/hadoop jar hadoop-*-examples.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 10/03/12 17:16:17 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s). 10/03/12 17:16:19 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s). Alex Kozlov wrote: > Can you endeavor a simpler job (just to make sure your setup works): > > $ hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar pi 2 2 > > Alex K > > On Wed, Feb 3, 2010 at 8:26 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > > >> Alex, thanks for the help, it seems to start now, however >> >> >> $ bin/hadoop jar hadoop-*-examples.jar grep -fs local input output >> 'dfs[a-z.]+' >> 10/02/03 20:02:41 WARN fs.FileSystem: "local" is a deprecated filesystem >> name. Use "file:///" instead. >> 10/02/03 20:02:43 INFO mapred.FileInputFormat: Total input paths to process >> : 3 >> 10/02/03 20:02:44 INFO mapred.JobClient: Running job: job_201002031354_0013 >> 10/02/03 20:02:45 INFO mapred.JobClient: map 0% reduce 0% >> >> >> >> it hangs here (is pseudo cluster supposed to work?) >> >> >> these are bottom of various log files >> >> conf log file >> >> >> <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property> >> >> <property><name>mapred.input.dir</name><value>file:/C:/OpenSSH/usr/local/hadoop-0.19.2/input</value></property> >> <property><name>mapred.job.tracker.http.address</name><value>0.0.0.0:50030 >> </value></property> >> <property><name>io.file.buffer.size</name><value>4096</value></property> >> >> <property><name>mapred.jobtracker.restart.recover</name><value>false</value></property> >> >> <property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization</value></property> >> >> <property><name>dfs.datanode.handler.count</name><value>3</value></property> >> >> <property><name>mapred.reduce.copy.backoff</name><value>300</value></property> >> <property><name>mapred.task.profile</name><value>false</value></property> >> >> <property><name>dfs.replication.considerLoad</name><value>true</value></property> >> >> <property><name>jobclient.output.filter</name><value>FAILED</value></property> >> >> <property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property> >> >> <property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property> >> <property><name>fs.checkpoint.size</name><value>67108864</value></property> >> >> >> bottom >> namenode log >> >> added to blk_6520091160827873550_1036 size 570 >> 2010-02-03 20:02:43,826 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml >> dst=null perm=brian:supergroup:rw-r--r-- >> 2010-02-03 20:02:43,866 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=setPermission >> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml >> dst=null perm=brian:supergroup:rw-r--r-- >> 2010-02-03 20:02:44,026 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> NameSystem.allocateBlock: >> /cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml. >> blk_517844159758473296_1037 >> 2010-02-03 20:02:44,076 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to >> blk_517844159758473296_1037 size 16238 >> 2010-02-03 20:02:44,257 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=open >> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml
-
Re: hadoop under cygwin issueBrian Wolf 2010-03-13, 19:34
Hi Alex,
seems to: $ bin/hadoop fs -ls / Found 1 items drwxr-xr-x - brian supergroup 0 2010-03-13 10:45 /tmp However, I think this might be the source of the problems, whenever I invoke any of the scripts, I get always get these issues: localhost: /usr/bin/bash: /usr/local/hadoop-0.20.2/bin/hadoop-daemon.sh: No such file or directory I 'm thinking this is something to do with cygwin (?). Ive been careful not to open these files with a windows editor (I've already been through that headache!) which I guess I have been ignoring, but I thnk what ever hadoop-daemon is suppossed to do isn't getting done. However, I have tried to invoke it by hand by echoing out what I guess the arguments are supposed to be , like "hadoop-daemon start datanode" , but that doesn't seem to work, ie (also, is there a minimum amt of hd space required, as I have only 1 gig or so free ) like: after i run start-all.sh, I run $ bin/hadoop-daemon.sh start datanode starting datanode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-brian-datanode-wynn6266448332.out ok, but then I try to run the grep example, I get these errors: 2010-03-13 11:27:57,149 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 2010-03-13 11:27:57,149 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/tmp/hadoop-SYSTEM/mapred/system/jobtracker.info" - Aborting... 2010-03-13 11:27:57,149 WARN org.apache.hadoop.mapred.JobTracker: Writing to file hdfs://localhost:9000/tmp/hadoop-SYSTEM/mapred/system/jobtracker.info failed! 2010-03-13 11:27:57,149 WARN org.apache.hadoop.mapred.JobTracker: FileSystem is not ready yet! 2010-03-13 11:27:57,369 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager. org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-SYSTEM/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Alex Kozlov wrote: > Hi Brian, > > Is your namenode running? Try 'hadoop fs -ls /'. > > Alex > > > On Mar 12, 2010, at 5:20 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > >> Hi Alex, >> >> I am back on this problem. Seems it works, but I have this issue >> with connecting to server. >> I can connect 'ssh localhost' ok. >> >> Thanks >> Brian >> >> $ bin/hadoop jar hadoop-*-examples.jar pi 2 2 >> Number of Maps = 2 >> Samples per Map = 2 >> 10/03/12 17:16:17 INFO ipc.Client: Retrying connect to server: >> localhost/127.0.0.1:9000. Already tried 0 time(s). >> 10/03/12 17:16:19 INFO ipc.Client: Retrying connect to server: >> localhost/127.0.0.1:9000. Already tried 1 time(s). >> >> >> >> Alex Kozlov wrote: >>> Can you endeavor a simpler job (just to make sure your setup works): >>> >>> $ hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar pi 2 2 >>> >>> Alex K >>> >>> On Wed, Feb 3, 2010 at 8:26 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: >>> >>> >>>> Alex, thanks for the help, it seems to start now, however >>>> >>>> >>>> $ bin/hadoop jar hadoop-*-examples.jar grep -fs local input output >>>> 'dfs[a-z.]+' >>>> 10/02/03 20:02:41 WARN fs.FileSystem: "local" is a deprecated >>>> filesystem >>>> name. Use "file:///" instead. >>>> 10/02/03 20:02:43 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 3 >>>> 10/02/03 20:02:44 INFO mapred.JobClient: Running job: >>>> job_201002031354_0013 >>>> 10/02/03 20:02:45 INFO mapred.JobClient: map 0% reduce 0% >>>> >>>> >>>> >>>> it hangs here (is pseudo cluster supposed to work?) >>>> >>>> >>>> these are bottom of various log files >>>> >>>> conf log file >>>> >>>> >>>> <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property> >>>> >>>> >>>> <property><name>mapred.input.dir</name><value>file:/C:/OpenSSH/usr/local/hadoop-0.19.2/input</value></property> >>>> >>>> <property><name>mapred.job.tracker.http.address</name><value>0.0.0.0:50030 >>>> >>>> </value></property> >>>> <property><name>io.file.buffer.size</name><value>4096</value></property>
-
Re: hadoop under cygwin issueAlex Kozlov 2010-03-13, 20:44
Hi Brian,
Is your namenode running? Try 'hadoop fs -ls /'. Alex On Mar 12, 2010, at 5:20 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > Hi Alex, > > I am back on this problem. Seems it works, but I have this issue > with connecting to server. > I can connect 'ssh localhost' ok. > > Thanks > Brian > > $ bin/hadoop jar hadoop-*-examples.jar pi 2 2 > Number of Maps = 2 > Samples per Map = 2 > 10/03/12 17:16:17 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9000. Already tried 0 time(s). > 10/03/12 17:16:19 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9000. Already tried 1 time(s). > > > > Alex Kozlov wrote: >> Can you endeavor a simpler job (just to make sure your setup works): >> >> $ hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar pi 2 2 >> >> Alex K >> >> On Wed, Feb 3, 2010 at 8:26 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: >> >> >>> Alex, thanks for the help, it seems to start now, however >>> >>> >>> $ bin/hadoop jar hadoop-*-examples.jar grep -fs local input output >>> 'dfs[a-z.]+' >>> 10/02/03 20:02:41 WARN fs.FileSystem: "local" is a deprecated >>> filesystem >>> name. Use "file:///" instead. >>> 10/02/03 20:02:43 INFO mapred.FileInputFormat: Total input paths >>> to process >>> : 3 >>> 10/02/03 20:02:44 INFO mapred.JobClient: Running job: >>> job_201002031354_0013 >>> 10/02/03 20:02:45 INFO mapred.JobClient: map 0% reduce 0% >>> >>> >>> >>> it hangs here (is pseudo cluster supposed to work?) >>> >>> >>> these are bottom of various log files >>> >>> conf log file >>> >>> >>> <property><name>fs.s3.impl</ >>> name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property> >>> >>> <property><name>mapred.input.dir</name><value>file:/C:/OpenSSH/usr/ >>> local/hadoop-0.19.2/input</value></property> >>> <property><name>mapred.job.tracker.http.address</ >>> name><value>0.0.0.0:50030 >>> </value></property> >>> <property><name>io.file.buffer.size</name><value>4096</value></ >>> property> >>> >>> <property><name>mapred.jobtracker.restart.recover</ >>> name><value>false</value></property> >>> >>> <property><name>io.serializations</ >>> name><value>org.apache.hadoop.io.serializer.WritableSerialization</ >>> value></property> >>> >>> <property><name>dfs.datanode.handler.count</name><value>3</value></ >>> property> >>> >>> <property><name>mapred.reduce.copy.backoff</name><value>300</ >>> value></property> >>> <property><name>mapred.task.profile</name><value>false</value></ >>> property> >>> >>> <property><name>dfs.replication.considerLoad</name><value>true</ >>> value></property> >>> >>> <property><name>jobclient.output.filter</name><value>FAILED</ >>> value></property> >>> >>> <property><name>mapred.tasktracker.map.tasks.maximum</ >>> name><value>2</value></property> >>> >>> <property><name>io.compression.codecs</ >>> name> >>> <value> >>> org.apache.hadoop.io.compress.DefaultCodec, >>> org.apache.hadoop.io.compress.GzipCodec, >>> org.apache.hadoop.io.compress.BZip2Codec</value></property> >>> <property><name>fs.checkpoint.size</name><value>67108864</value></ >>> property> >>> >>> >>> bottom >>> namenode log >>> >>> added to blk_6520091160827873550_1036 size 570 >>> 2010-02-03 20:02:43,826 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >>> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/ >>> job.xml >>> dst=null perm=brian:supergroup:rw-r--r-- >>> 2010-02-03 20:02:43,866 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 >>> cmd=setPermission >>> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/ >>> job.xml >>> dst=null perm=brian:supergroup:rw-r--r-- >>> 2010-02-03 20:02:44,026 INFO org.apache.hadoop.hdfs.StateChange: >>> BLOCK* >>> NameSystem.allocateBlock: >>> /cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/ >>> job.xml.
-
Re: hadoop under cygwin issueBrian Wolf 2010-03-18, 00:37
Alex Kozlov wrote:
> Hi Brian, > > Is your namenode running? Try 'hadoop fs -ls /'. > > Alex > > > On Mar 12, 2010, at 5:20 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: > >> Hi Alex, >> >> I am back on this problem. Seems it works, but I have this issue >> with connecting to server. >> I can connect 'ssh localhost' ok. >> >> Thanks >> Brian >> >> $ bin/hadoop jar hadoop-*-examples.jar pi 2 2 >> Number of Maps = 2 >> Samples per Map = 2 >> 10/03/12 17:16:17 INFO ipc.Client: Retrying connect to server: >> localhost/127.0.0.1:9000. Already tried 0 time(s). >> 10/03/12 17:16:19 INFO ipc.Client: Retrying connect to server: >> localhost/127.0.0.1:9000. Already tried 1 time(s). >> >> >> >> Alex Kozlov wrote: >>> Can you endeavor a simpler job (just to make sure your setup works): >>> >>> $ hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar pi 2 2 >>> >>> Alex K >>> >>> On Wed, Feb 3, 2010 at 8:26 PM, Brian Wolf <[EMAIL PROTECTED]> wrote: >>> >>> >>>> Alex, thanks for the help, it seems to start now, however >>>> >>>> >>>> $ bin/hadoop jar hadoop-*-examples.jar grep -fs local input output >>>> 'dfs[a-z.]+' >>>> 10/02/03 20:02:41 WARN fs.FileSystem: "local" is a deprecated >>>> filesystem >>>> name. Use "file:///" instead. >>>> 10/02/03 20:02:43 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 3 >>>> 10/02/03 20:02:44 INFO mapred.JobClient: Running job: >>>> job_201002031354_0013 >>>> 10/02/03 20:02:45 INFO mapred.JobClient: map 0% reduce 0% >>>> >>>> >>>> >>>> it hangs here (is pseudo cluster supposed to work?) >>>> >>>> >>>> these are bottom of various log files >>>> >>>> conf log file >>>> >>>> >>>> <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property> >>>> >>>> >>>> <property><name>mapred.input.dir</name><value>file:/C:/OpenSSH/usr/local/hadoop-0.19.2/input</value></property> >>>> >>>> <property><name>mapred.job.tracker.http.address</name><value>0.0.0.0:50030 >>>> >>>> </value></property> >>>> <property><name>io.file.buffer.size</name><value>4096</value></property> >>>> >>>> >>>> <property><name>mapred.jobtracker.restart.recover</name><value>false</value></property> >>>> >>>> >>>> <property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization</value></property> >>>> >>>> >>>> <property><name>dfs.datanode.handler.count</name><value>3</value></property> >>>> >>>> >>>> <property><name>mapred.reduce.copy.backoff</name><value>300</value></property> >>>> >>>> <property><name>mapred.task.profile</name><value>false</value></property> >>>> >>>> >>>> <property><name>dfs.replication.considerLoad</name><value>true</value></property> >>>> >>>> >>>> <property><name>jobclient.output.filter</name><value>FAILED</value></property> >>>> >>>> >>>> <property><name>mapred.tasktracker.map.tasks.maximum</name><value>2</value></property> >>>> >>>> >>>> <property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property> >>>> >>>> <property><name>fs.checkpoint.size</name><value>67108864</value></property> >>>> >>>> >>>> >>>> bottom >>>> namenode log >>>> >>>> added to blk_6520091160827873550_1036 size 570 >>>> 2010-02-03 20:02:43,826 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 cmd=create >>>> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml >>>> >>>> dst=null perm=brian:supergroup:rw-r--r-- >>>> 2010-02-03 20:02:43,866 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >>>> ugi=brian,None,Administrators,Users ip=/127.0.0.1 >>>> cmd=setPermission >>>> >>>> src=/cygwin/tmp/hadoop-brian/mapred/system/job_201002031354_0013/job.xml >>>> >>>> dst=null perm=brian:supergroup:rw-r--r-- >>>> 2010-02-03 20:02:44,026 INFO org.apache.hadoop.hdfs.StateChange: >>>> BLOCK* >>>> NameSystem.allocateBlock: Hi Alex, Im using a different system, it seems to be running better now. Thanks Brian |