Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> RE: public IP for datanode on EC2


Copy link to this message
-
RE: public IP for datanode on EC2
Btw - I figured out the problem.

The jobconf from the remote client had the socks proxy configuration - the jvm spawned by TTs picked this up and tried to connect using the proxy which of course didn't work.

This was easy to solve - just had to make the remote initialization script mark hadoop.rpc.socket.factory.class.default as final variable in the hadoop-site.xml on server side.

I am assuming that this would be a good thing to do in general (can't believe why server side traffic would be routed through a proxy!).

Filed https://issues.apache.org/jira/browse/HADOOP-5839 to follow up on the issues uncovered here.

-----Original Message-----
From: Tom White [mailto:[EMAIL PROTECTED]]
Sent: Thursday, May 14, 2009 7:07 AM
To: [EMAIL PROTECTED]
Subject: Re: public IP for datanode on EC2

Yes, you're absolutely right.

Tom

On Thu, May 14, 2009 at 2:19 PM, Joydeep Sen Sarma <[EMAIL PROTECTED]> wrote:
> The ec2 documentation point to the use of public 'ip' addresses - whereas using public 'hostnames' seems safe since it resolves to internal addresses from within the cluster (and resolve to public ip addresses from outside).
>
> The only data transfer that I would incur while submitting jobs from outside is the cost of copying the jar files and any other files meant for the distributed cache). That would be extremely small.
>
>
> -----Original Message-----
> From: Tom White [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 14, 2009 5:58 AM
> To: [EMAIL PROTECTED]
> Subject: Re: public IP for datanode on EC2
>
> Hi Joydeep,
>
> The problem you are hitting may be because port 50001 isn't open,
> whereas from within the cluster any node may talk to any other node
> (because the security groups are set up to do this).
>
> However I'm not sure this is a good approach. Configuring Hadoop to
> use public IP addresses everywhere should work, but you have to pay
> for all data transfer between nodes (see http://aws.amazon.com/ec2/,
> "Public and Elastic IP Data Transfer"). This is going to get expensive
> fast!
>
> So to get this to work well, we would have to make changes to Hadoop
> so it was aware of both public and private addresses, and use the
> appropriate one: clients would use the public address, while daemons
> would use the private address. I haven't looked at what it would take
> to do this or how invasive it would be.
>
> Cheers,
> Tom
>
> On Thu, May 14, 2009 at 1:37 PM, Joydeep Sen Sarma <[EMAIL PROTECTED]> wrote:
>> I changed the ec2 scripts to have fs.default.name assigned to the public hostname (instead of the private hostname).
>>
>> Now I can submit jobs remotely via the socks proxy (the problem below is resolved) - but the map tasks fail with an exception:
>>
>>
>> 2009-05-14 07:30:34,913 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-75-101-199-45.compute-1.amazonaws.com/10.254.175.132:50001. Already tried 9 time(s).
>> 2009-05-14 07:30:34,914 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
>> java.io.IOException: Call to ec2-75-101-199-45.compute-1.amazonaws.com/10.254.175.132:50001 failed on local exception: Connection refused
>>        at org.apache.hadoop.ipc.Client.call(Client.java:699)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>        at $Proxy1.getProtocolVersion(Unknown Source)
>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
>>        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
>>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
>>        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
>>        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
>>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
>>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
>>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
>>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB