Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - RE: public IP for datanode on EC2


Copy link to this message
-
RE: public IP for datanode on EC2
Joydeep Sen Sarma 2009-05-14, 12:37
I changed the ec2 scripts to have fs.default.name assigned to the public hostname (instead of the private hostname).

Now I can submit jobs remotely via the socks proxy (the problem below is resolved) - but the map tasks fail with an exception:
2009-05-14 07:30:34,913 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-75-101-199-45.compute-1.amazonaws.com/10.254.175.132:50001. Already tried 9 time(s).
2009-05-14 07:30:34,914 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.IOException: Call to ec2-75-101-199-45.compute-1.amazonaws.com/10.254.175.132:50001 failed on local exception: Connection refused
        at org.apache.hadoop.ipc.Client.call(Client.java:699)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at org.apache.hadoop.mapred.Child.main(Child.java:153)
strangely enough - job submissions from nodes within the ec2 cluster work just fine. I looked at the job.xml files of jobs submitted locally and remotely and don't see any relevant differences.

Totally foxed now.

Joydeep

-----Original Message-----
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 13, 2009 9:38 PM
To: [EMAIL PROTECTED]
Cc: Tom White
Subject: RE: public IP for datanode on EC2

Thanks Philip. Very helpful (and great blog post)! This seems to make basic dfs command line operations work just fine.

However - I am hitting a new error during job submission (running hadoop-0.19.0):

2009-05-14 00:15:34,430 ERROR exec.ExecDriver (SessionState.java:printError(279)) - Job Submission failed with exception 'java.net.UnknownHostException(unknown host: domU-12-31-39-00-51-94.compute-1.internal)'
java.net.UnknownHostException: unknown host: domU-12-31-39-00-51-94.compute-1.internal
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:791)
at org.apache.hadoop.ipc.Client.call(Client.java:686)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:176)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:75)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:469)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:603)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
looking at the stack trace and the code - it seems that this is happening because the jobclient asks for the mapred system directory from the jobtracker - which replies back with a path name that's qualified against the fs.default.name setting of the jobtracker. Unfortunately the standard EC2 scripts assign this to the internal hostname of the hadoop master.

Is there any downside to using public hostnames instead of the private ones in the ec2 starter scripts?

Thanks for the help,

Joydeep
From: Philip Zeyliger [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 13, 2009 2:40 PM
To: [EMAIL PROTECTED]
Subject: Re: public IP for datanode on EC2

On Tue, May 12, 2009 at 9:11 PM, Joydeep Sen Sarma <[EMAIL PROTECTED]> wrote:

You could use ssh to set up a SOCKS proxy between your machine and
ec2, and setup org.apache.hadoop.net.SocksSocketFactory to be the
socket factory.
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
has more information.