|
Jay Vyas
2011-10-28, 05:52
Harsh J
2011-10-28, 06:03
Arpit Gupta
2011-10-28, 06:05
Jay Vyas
2011-10-28, 23:04
Tom Melendez
2011-10-29, 00:24
Jay Vyas
2011-10-29, 02:57
Tom Melendez
2011-10-29, 03:41
Alex Gauthier
2011-10-29, 03:43
Alex Gauthier
2011-10-29, 03:43
JAX
2011-10-29, 04:16
Alex Gauthier
2011-10-29, 04:17
JAX
2011-10-29, 04:19
|
-
writing to hdfs via java apiJay Vyas 2011-10-28, 05:52
I found a way to connect to hadoop via hftp, and it works fine, (read only)
: uri = "hftp://172.16.xxx.xxx:50070/"; System.out.println( "uri: " + uri ); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get( URI.create( uri ), conf ); fs.printStatistics(); However, it appears that hftp is read only, and I want to read/write as well as copy files, that is, I want to connect over hdfs . How can I enable hdfs connections so that i can edit the actual , remote filesystem using the file / path's APIs ? Are there ssh settings that have to be set before i can do this > ? I tried to change the protocol above from "hftp" -> "hdfs", but I got the following exception ... Exception in thread "main" java.io.IOException: Call to / 172.16.112.131:50070 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at $Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at sb.HadoopRemote.main(HadoopRemote.java:24)
-
Re: writing to hdfs via java apiHarsh J 2011-10-28, 06:03
Jay,
Using the hdfs:// scheme is the right way, as you have determined. However… A few things you need to ensure while using the Java FileSystem API to do your HDFS tasks: - Connect to NameNode's RPC port, not the web port. Default RPC port is usually 8020, but your fs.default.name config will tell you the right one. - Do your client and server Hadoop versions match perfectly? If not, make it so as you could run into protocol incompatibility issues between versions. - Ensure your client can connect to the RPC ports of NameNode and DataNode both for reads/writes. If there's a firewall, you may need to configure it to allow this. On Fri, Oct 28, 2011 at 11:22 AM, Jay Vyas <[EMAIL PROTECTED]> wrote: > I found a way to connect to hadoop via hftp, and it works fine, (read only) > : > > uri = "hftp://172.16.xxx.xxx:50070/"; > > System.out.println( "uri: " + uri ); > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.get( URI.create( uri ), conf ); > fs.printStatistics(); > > However, it appears that hftp is read only, and I want to read/write as well > as copy files, that is, I want to connect over hdfs . How can I enable hdfs > connections so that i can edit the actual , remote filesystem using the file > / path's APIs ? Are there ssh settings that have to be set before i can do > this > ? > > I tried to change the protocol above from "hftp" -> "hdfs", but I got the > following exception ... > > Exception in thread "main" java.io.IOException: Call to / > 172.16.112.131:50070 failed on local exception: java.io.EOFException at > org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at > org.apache.hadoop.ipc.Client.call(Client.java:1107) at > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at > $Proxy0.getProtocolVersion(Unknown Source) at > org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at > org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at > sb.HadoopRemote.main(HadoopRemote.java:24) > -- Harsh J
-
Re: writing to hdfs via java apiArpit Gupta 2011-10-28, 06:05
hdfs scheme should work but you will have to change the port. To find
the correct port # look for fs.default.name prop in the core-site.xml or the namenode ui should also state the port. -- Arpit On Oct 27, 2011, at 10:52 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > I found a way to connect to hadoop via hftp, and it works fine, (read only) > : > > uri = "hftp://172.16.xxx.xxx:50070/"; > > System.out.println( "uri: " + uri ); > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.get( URI.create( uri ), conf ); > fs.printStatistics(); > > However, it appears that hftp is read only, and I want to read/write as well > as copy files, that is, I want to connect over hdfs . How can I enable hdfs > connections so that i can edit the actual , remote filesystem using the file > / path's APIs ? Are there ssh settings that have to be set before i can do > this > ? > > I tried to change the protocol above from "hftp" -> "hdfs", but I got the > following exception ... > > Exception in thread "main" java.io.IOException: Call to / > 172.16.112.131:50070 failed on local exception: java.io.EOFException at > org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at > org.apache.hadoop.ipc.Client.call(Client.java:1107) at > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at > $Proxy0.getProtocolVersion(Unknown Source) at > org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at > org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at > sb.HadoopRemote.main(HadoopRemote.java:24)
-
Re: writing to hdfs via java apiJay Vyas 2011-10-28, 23:04
Hi guys : Made more progress debugging my hadoop connection, but still
haven't got it working...... It looks like my VM (cloudera hadoop) won't let me in. I find that there is no issue connecting to the name node - that is , using hftp and 50070...... via standard HFTP as in here : //This method works fine - connecting directly to hadoop's namenode and querying the filesystem public static void main1(String[] args) throws Exception { String uri = "hftp://155.37.101.76:50070/"; System.out.println( "uri: " + uri ); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get( URI.create( uri ), conf ); fs.printStatistics(); } But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I am modifying the uri to access port 8020 which is what is in my core-site.xml . // This fails, resulting (trys to connect over and over again, eventually gives up printing "already tried to connect 20 times"....) public static void main(String[] args) { try { String uri = "hdfs://155.37.101.76:8020/"; System.out.println( "uri: " + uri ); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get( URI.create( uri ), conf ); fs.printStatistics(); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } The error message is : 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: / 155.37.101.76:8020. Already tried 0 time(s). 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: / 155.37.101.76:8020. Already tried 1 time(s). 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: / 155.37.101.76:8020. Already tried 2 time(s). 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: / 155.37.101.76:8020. Already tried 3 time(s). Any thoughts on this would be *really* be appreciated ... Thanks guys.
-
Re: writing to hdfs via java apiTom Melendez 2011-10-29, 00:24
Hi Jay,
Some questions for you: - Does the hadoop client itself work from that same machine? - Are you actually able to run the hadoop example jar (in other words, your setup is valid otherwise)? - Is port 8020 actually available? (you can telnet or nc to it?) - What does jps show on the namenode? Thanks, Tom On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > Hi guys : Made more progress debugging my hadoop connection, but still > haven't got it working...... It looks like my VM (cloudera hadoop) won't > let me in. I find that there is no issue connecting to the name node - that > is , using hftp and 50070...... > > via standard HFTP as in here : > > //This method works fine - connecting directly to hadoop's namenode and > querying the filesystem > public static void main1(String[] args) throws Exception > { > String uri = "hftp://155.37.101.76:50070/"; > > System.out.println( "uri: " + uri ); > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.get( URI.create( uri ), conf ); > fs.printStatistics(); > } > > > But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I am > modifying the uri to access port 8020 > which is what is in my core-site.xml . > > // This fails, resulting (trys to connect over and over again, eventually > gives up printing "already tried to connect 20 times"....) > public static void main(String[] args) > { > try { > String uri = "hdfs://155.37.101.76:8020/"; > > System.out.println( "uri: " + uri ); > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.get( URI.create( uri ), conf ); > fs.printStatistics(); > } catch (Exception e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > } > > The error message is : > > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: / > 155.37.101.76:8020. Already tried 0 time(s). > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: / > 155.37.101.76:8020. Already tried 1 time(s). > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: / > 155.37.101.76:8020. Already tried 2 time(s). > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: / > 155.37.101.76:8020. Already tried 3 time(s). > > Any thoughts on this would be *really* be appreciated ... Thanks guys. >
-
Re: writing to hdfs via java apiJay Vyas 2011-10-29, 02:57
Thanks tom : Thats interesting....
First, I tried, and it complained that the input directory didnt exist, so I ran $> hadoop fs -mkdir /user/cloudera/input Then, I tried to do this : $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2 'dfs[a-z.]+' And it seemed to start working ...... But then it abruptly printed "killed" somehow at the end of the job [scroll down] ? Maybe this is related to why i cant connect ..... ?! 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process : 0 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process : 1 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0% Killed On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: > Hi Jay, > > Some questions for you: > > - Does the hadoop client itself work from that same machine? > - Are you actually able to run the hadoop example jar (in other words, > your setup is valid otherwise)? > - Is port 8020 actually available? (you can telnet or nc to it?) > - What does jps show on the namenode? > > Thanks, > > Tom > > On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > > Hi guys : Made more progress debugging my hadoop connection, but still > > haven't got it working...... It looks like my VM (cloudera hadoop) won't > > let me in. I find that there is no issue connecting to the name node - > that > > is , using hftp and 50070...... > > > > via standard HFTP as in here : > > > > //This method works fine - connecting directly to hadoop's namenode and > > querying the filesystem > > public static void main1(String[] args) throws Exception > > { > > String uri = "hftp://155.37.101.76:50070/"; > > > > System.out.println( "uri: " + uri ); > > Configuration conf = new Configuration(); > > > > FileSystem fs = FileSystem.get( URI.create( uri ), conf ); > > fs.printStatistics(); > > } > > > > > > But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I Jay Vyas MMSB/UCHC
-
Re: writing to hdfs via java apiTom Melendez 2011-10-29, 03:41
Hi Jay,
Are you able to look at the logs or the web interface? Can you find out why it's getting killed? Also, can you verify that these ports are open and a process is connected to them (maybe with netstat)? http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ Thanks, Tom On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > Thanks tom : Thats interesting.... > > First, I tried, and it complained that the input directory didnt exist, so I > ran > $> hadoop fs -mkdir /user/cloudera/input > > Then, I tried to do this : > > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2 > 'dfs[a-z.]+' > > And it seemed to start working ...... But then it abruptly printed "killed" > somehow at the end of the job [scroll down] ? > > Maybe this is related to why i cant connect ..... ?! > > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to > load native-hadoop library for your platform... using builtin-java classes > where applicable > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process > : 0 > 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009 > 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% > 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009 > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 > 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters > 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 > 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters > 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 > 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 > 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process > : 1 > 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010 > 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0% > Killed > > > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: > >> Hi Jay, >> >> Some questions for you: >> >> - Does the hadoop client itself work from that same machine? >> - Are you actually able to run the hadoop example jar (in other words, >> your setup is valid otherwise)? >> - Is port 8020 actually available? (you can telnet or nc to it?) >> - What does jps show on the namenode? >> >> Thanks, >> >> Tom >> >> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: >> > Hi guys : Made more progress debugging my hadoop connection, but still >> > haven't got it working...... It looks like my VM (cloudera hadoop) won't >> > let me in. I find that there is no issue connecting to the name node - >> that >> > is , using hftp and 50070...... >> > >> > via standard HFTP as in here : >> > >>
-
Re: writing to hdfs via java apiAlex Gauthier 2011-10-29, 03:43
On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <[EMAIL PROTECTED]> wrote:
> Hi Jay, > > Are you able to look at the logs or the web interface? Can you find > out why it's getting killed? > > Also, can you verify that these ports are open and a process is > connected to them (maybe with netstat)? > > http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ > > Thanks, > > Tom > > On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > > Thanks tom : Thats interesting.... > > > > First, I tried, and it complained that the input directory didnt exist, > so I > > ran > > $> hadoop fs -mkdir /user/cloudera/input > > > > Then, I tried to do this : > > > > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2 > > 'dfs[a-z.]+' > > > > And it seemed to start working ...... But then it abruptly printed > "killed" > > somehow at the end of the job [scroll down] ? > > > > Maybe this is related to why i cant connect ..... ?! > > > > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to > > load native-hadoop library for your platform... using builtin-java > classes > > where applicable > > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not > loaded > > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to > process > > : 0 > > 11/10/14 21:34:44 INFO mapred.JobClient: Running job: > job_201110142010_0009 > > 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% > > 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% > > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: > job_201110142010_0009 > > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 > > 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters > > 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 > > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 > > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all > reduces > > waiting after reserving slots (ms)=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps > > waiting after reserving slots (ms)=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 > > 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters > > 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 > > 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 > > 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework > > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 > > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 > > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to > process > > : 1 > > 11/10/14 21:34:58 INFO mapred.JobClient: Running job: > job_201110142010_0010 > > 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0% > > Killed > > > > > > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: > > > >> Hi Jay, > >> > >> Some questions for you: > >> > >> - Does the hadoop client itself work from that same machine? > >> - Are you actually able to run the hadoop example jar (in other words, > >> your setup is valid otherwise)? > >> - Is port 8020 actually available? (you can telnet or nc to it?) > >> - What does jps show on the namenode? > >> > >> Thanks, > >> > >> Tom > >> > >> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: > >> > Hi guys : Made more progress debugging my hadoop connection, but still
-
Re: writing to hdfs via java apiAlex Gauthier 2011-10-29, 03:43
Brutal Friday night. Coding < pussy.
:) On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <[EMAIL PROTECTED]>wrote: > > > On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: > >> Hi Jay, >> >> Are you able to look at the logs or the web interface? Can you find >> out why it's getting killed? >> >> Also, can you verify that these ports are open and a process is >> connected to them (maybe with netstat)? >> >> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ >> >> Thanks, >> >> Tom >> >> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: >> > Thanks tom : Thats interesting.... >> > >> > First, I tried, and it complained that the input directory didnt exist, >> so I >> > ran >> > $> hadoop fs -mkdir /user/cloudera/input >> > >> > Then, I tried to do this : >> > >> > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input >> output2 >> > 'dfs[a-z.]+' >> > >> > And it seemed to start working ...... But then it abruptly printed >> "killed" >> > somehow at the end of the job [scroll down] ? >> > >> > Maybe this is related to why i cant connect ..... ?! >> > >> > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable >> to >> > load native-hadoop library for your platform... using builtin-java >> classes >> > where applicable >> > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not >> loaded >> > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to >> process >> > : 0 >> > 11/10/14 21:34:44 INFO mapred.JobClient: Running job: >> job_201110142010_0009 >> > 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% >> > 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% >> > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: >> job_201110142010_0009 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters >> > 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 >> > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all >> reduces >> > waiting after reserving slots (ms)=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all >> maps >> > waiting after reserving slots (ms)=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 >> > 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters >> > 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 >> > 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework >> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 >> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 >> > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for >> > parsing the arguments. Applications should implement Tool for the same. >> > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to >> process >> > : 1 >> > 11/10/14 21:34:58 INFO mapred.JobClient: Running job: >> job_201110142010_0010 >> > 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0% >> > Killed >> > >> > >> > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: >> > >> >> Hi Jay, >> >> >> >> Some questions for you: >> >> >> >> - Does the hadoop client itself work from that same machine? >> >> - Are you actually able to run the hadoop example jar (in other words, >> >> your setup is valid otherwise)? >> >> - Is port 8020 actually available? (you can telnet or nc to it?)
-
Re: writing to hdfs via java apiJAX 2011-10-29, 04:16
Yup.... Brutal :-|
but you never regret fixing a bug ... Unlike ------- Sent from my iPad On Oct 28, 2011, at 11:43 PM, Alex Gauthier <[EMAIL PROTECTED]> wrote: > Brutal Friday night. Coding < pussy. > > :) > > On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <[EMAIL PROTECTED]>wrote: > >> >> >> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: >> >>> Hi Jay, >>> >>> Are you able to look at the logs or the web interface? Can you find >>> out why it's getting killed? >>> >>> Also, can you verify that these ports are open and a process is >>> connected to them (maybe with netstat)? >>> >>> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ >>> >>> Thanks, >>> >>> Tom >>> >>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: >>>> Thanks tom : Thats interesting.... >>>> >>>> First, I tried, and it complained that the input directory didnt exist, >>> so I >>>> ran >>>> $> hadoop fs -mkdir /user/cloudera/input >>>> >>>> Then, I tried to do this : >>>> >>>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input >>> output2 >>>> 'dfs[a-z.]+' >>>> >>>> And it seemed to start working ...... But then it abruptly printed >>> "killed" >>>> somehow at the end of the job [scroll down] ? >>>> >>>> Maybe this is related to why i cant connect ..... ?! >>>> >>>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable >>> to >>>> load native-hadoop library for your platform... using builtin-java >>> classes >>>> where applicable >>>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not >>> loaded >>>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to >>> process >>>> : 0 >>>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: >>> job_201110142010_0009 >>>> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% >>>> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: >>> job_201110142010_0009 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all >>> reduces >>>> waiting after reserving slots (ms)=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all >>> maps >>>> waiting after reserving slots (ms)=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters >>>> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 >>>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to >>> process >>>> : 1 >>>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job: >>> job_201110142010_0010 >>>> 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0% >>>> Killed >>>> >>>> >>>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi Jay, >>>>> >>>>> Some questions for you: >>>>> >>>>> - Does the hadoop client itself work from that same machine?
-
Re: writing to hdfs via java apiAlex Gauthier 2011-10-29, 04:17
Touché my friend... if only I could only.... :)
On Fri, Oct 28, 2011 at 9:16 PM, JAX <[EMAIL PROTECTED]> wrote: > Yup.... Brutal :-| > but you never regret fixing a bug ... Unlike ------- > > Sent from my iPad > > On Oct 28, 2011, at 11:43 PM, Alex Gauthier <[EMAIL PROTECTED]> > wrote: > > > Brutal Friday night. Coding < pussy. > > > > :) > > > > On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <[EMAIL PROTECTED] > >wrote: > > > >> > >> > >> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: > >> > >>> Hi Jay, > >>> > >>> Are you able to look at the logs or the web interface? Can you find > >>> out why it's getting killed? > >>> > >>> Also, can you verify that these ports are open and a process is > >>> connected to them (maybe with netstat)? > >>> > >>> > http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ > >>> > >>> Thanks, > >>> > >>> Tom > >>> > >>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <[EMAIL PROTECTED]> > wrote: > >>>> Thanks tom : Thats interesting.... > >>>> > >>>> First, I tried, and it complained that the input directory didnt > exist, > >>> so I > >>>> ran > >>>> $> hadoop fs -mkdir /user/cloudera/input > >>>> > >>>> Then, I tried to do this : > >>>> > >>>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input > >>> output2 > >>>> 'dfs[a-z.]+' > >>>> > >>>> And it seemed to start working ...... But then it abruptly printed > >>> "killed" > >>>> somehow at the end of the job [scroll down] ? > >>>> > >>>> Maybe this is related to why i cant connect ..... ?! > >>>> > >>>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable > >>> to > >>>> load native-hadoop library for your platform... using builtin-java > >>> classes > >>>> where applicable > >>>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not > >>> loaded > >>>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to > >>> process > >>>> : 0 > >>>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: > >>> job_201110142010_0009 > >>>> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% > >>>> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: > >>> job_201110142010_0009 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all > >>> reduces > >>>> waiting after reserving slots (ms)=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all > >>> maps > >>>> waiting after reserving slots (ms)=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 > >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 > >>>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for > >>>> parsing the arguments. Applications should implement Tool for the > same. > >>>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to > >>> process > >>>> : 1 > >>>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
-
Re: writing to hdfs via java apiJAX 2011-10-29, 04:19
Hi tom : which log will have info about why a process was Killed?
Sent from my iPad On Oct 28, 2011, at 11:41 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: > Hi Jay, > > Are you able to look at the logs or the web interface? Can you find > out why it's getting killed? > > Also, can you verify that these ports are open and a process is > connected to them (maybe with netstat)? > > http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/ > > Thanks, > > Tom > > On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: >> Thanks tom : Thats interesting.... >> >> First, I tried, and it complained that the input directory didnt exist, so I >> ran >> $> hadoop fs -mkdir /user/cloudera/input >> >> Then, I tried to do this : >> >> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2 >> 'dfs[a-z.]+' >> >> And it seemed to start working ...... But then it abruptly printed "killed" >> somehow at the end of the job [scroll down] ? >> >> Maybe this is related to why i cant connect ..... ?! >> >> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to >> load native-hadoop library for your platform... using builtin-java classes >> where applicable >> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded >> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process >> : 0 >> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009 >> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0% >> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100% >> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009 >> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14 >> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters >> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1 >> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627 >> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all reduces >> waiting after reserving slots (ms)=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps >> waiting after reserving slots (ms)=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050 >> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters >> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452 >> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86 >> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework >> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0 >> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0 >> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process >> : 1 >> 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010 >> 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0% >> Killed >> >> >> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <[EMAIL PROTECTED]> wrote: >> >>> Hi Jay, >>> >>> Some questions for you: >>> >>> - Does the hadoop client itself work from that same machine? >>> - Are you actually able to run the hadoop example jar (in other words, >>> your setup is valid otherwise)? >>> - Is port 8020 actually available? (you can telnet or nc to it?) >>> - What does jps show on the namenode? >>> >>> Thanks, >>> >>> Tom >>> >>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <[EMAIL PROTECTED]> wrote: >>>> Hi guys : Made more progress debugging my hadoop connection, but still |