|
George Stathis
2010-03-19, 15:00
Andrew Purtell
2010-03-19, 16:45
Patrick Hunt
2010-03-19, 16:46
Andrew Purtell
2010-03-19, 17:16
George Stathis
2010-03-19, 18:21
George Stathis
2010-03-19, 22:12
Andrew Purtell
2010-03-20, 21:01
Andrew Purtell
2010-03-20, 21:20
George Stathis
2010-03-20, 23:11
Andrew Purtell
2010-03-21, 19:10
George Stathis
2010-03-21, 23:10
|
-
Remote Java client connection into EC2 instanceGeorge Stathis 2010-03-19, 15:00
This has come up
before<http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/%[EMAIL PROTECTED]%3E>but I'm still unclear as to whether this is possible or not: remotely connecting to an EC2 instance using the Java client library. It's not for production purposes; we have developed a DAO layer with the HBase Java API and it works great when the app lives on the same machine as HBase (e.g. our development laptops). We now want to setup a continuous build process to run our unit tests. We already have a build machine that does not live on EC2 (we had it from before). That machine is too small to run HBase, even in pseudo-distributed mode, but it can still compile the code. So the thought is to have the machine run the build and point to a remote EC2 dev instance of HBase for unit testing. Now, I have gone though a lot of threads and posts and have opened up all required ports (I think) on EC2: 60000, 60020 and 2181 (I can telnet into them). I have one test EC2 instance running in pseudo-distributed mode to test the remote connection. I attempt to run a single unit test. From the client side, I see: [...] 2010-03-19 10:29:18,449 INFO [ClientCnxn - 869] - Attempting connection to server /184.73.xxx.yyy:2181 2010-03-19 10:29:18,496 INFO [ClientCnxn - 785] - Priming connection to java.nio.channels.SocketChannel[connected local=/192.168.1.16:55145remote=/184.73.xxx.yyy:2181] 2010-03-19 10:29:18,498 INFO [ClientCnxn - 937] - Server connection successful [...] and on the EC2 instance zookeeper logs, I see: [...] 2010-03-19 10:29:18,512 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /71.174.www.zzz:55145 lastZxid 0 2010-03-19 10:29:18,512 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x12776cfdefd0003 2010-03-19 10:29:18,516 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x12776cfdefd0003 valid:true [...] So some connection handshake is established. Then, nothing. It just hangs. Any suggestions on where to go from here or am I fighting a losing battle? Thank you in advance for your time. -GS
-
Re: Remote Java client connection into EC2 instanceAndrew Purtell 2010-03-19, 16:45
The IP addresses assigned on the cluster are all internal ones, so when the regionservers do a reverse lookup, they get something foo.internal. Then they report this to the master, which hands them out to the client library as region locations. So while you can telnet to 60020 on the slaves as you know the public DNS names, the client library is only able to learn of the internal ones.
Some options: 1) Run your clients up in the EC2 cloud also 2) Use a connector like Stargate or the Thrift server which can in effect proxy your requests to the EC2 hosted cluster. 3) Grab the latest scripts from 0.20 branch in SVN. In $HOME/.hbase-<cluster>-instances will be the list of instance identifiers of the slaves. Do: ec2-describe-instances `cat ~/.hbase-<cluster>-instances` | grep INSTANCE | grep running | awk '{print "$4 $5"}' This will give you a mapping between private and public names. Dump entries into your /etc/hosts which map public IP (use dig to look up) to private name. Yes, it's not a nice hack. 4) You can use SSH as a SOCKS 5 proxy (ssh -f -N -D <local-port> <remote>), which will also forward DNS requests, but to do it that way you'd have to hack the client library some. - Andy > From: George Stathis > Subject: Remote Java client connection into EC2 instance > To: [EMAIL PROTECTED] > Date: Friday, March 19, 2010, 8:00 AM > This has come up > before<http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/%[EMAIL PROTECTED]%3E>but > I'm still unclear as to whether this is possible or not: > remotely connecting to an EC2 instance using the Java client > library. [...] > Now, I have gone though a lot of threads and posts and have > opened up all required ports (I think) on EC2: 60000, 60020 > and 2181 (I can telnet into them). I have one test EC2 > instance running in pseudo-distributed mode to > test the remote connection. I attempt to run a single unit > test. [...]
-
Re: Remote Java client connection into EC2 instancePatrick Hunt 2010-03-19, 16:46
On the client try something like "echo stat|nc <host> 2181" where host
is the ZK server. If you get something back that means the ZooKeeper portion of the communication at least is working properly. http://bit.ly/dglVld I can't comment on what else is going on at the hbase level though. Patrick George Stathis wrote: > This has come up > before<http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/%[EMAIL PROTECTED]%3E>but > I'm still unclear as to whether this is possible or not: remotely > connecting to an EC2 instance using the Java client library. It's not for > production purposes; we have developed a DAO layer with the HBase Java API > and it works great when the app lives on the same machine as HBase (e.g. our > development laptops). We now want to setup a continuous build process to run > our unit tests. We already have a build machine that does not live on EC2 > (we had it from before). That machine is too small to run HBase, even in > pseudo-distributed mode, but it can still compile the code. So the thought > is to have the machine run the build and point to a remote EC2 dev instance > of HBase for unit testing. > > Now, I have gone though a lot of threads and posts and have opened up all > required ports (I think) on EC2: 60000, 60020 and 2181 (I can telnet into > them). I have one test EC2 instance running in pseudo-distributed mode to > test the remote connection. I attempt to run a single unit test. From the > client side, I see: > > [...] > 2010-03-19 10:29:18,449 INFO [ClientCnxn - 869] - Attempting connection to > server /184.73.xxx.yyy:2181 > 2010-03-19 10:29:18,496 INFO [ClientCnxn - 785] - Priming connection to > java.nio.channels.SocketChannel[connected > local=/192.168.1.16:55145remote=/184.73.xxx.yyy:2181] > 2010-03-19 10:29:18,498 INFO [ClientCnxn - 937] - Server connection > successful > [...] > > and on the EC2 instance zookeeper logs, I see: > > [...] > 2010-03-19 10:29:18,512 INFO org.apache.zookeeper.server.NIOServerCnxn: > Connected to /71.174.www.zzz:55145 lastZxid 0 > 2010-03-19 10:29:18,512 INFO org.apache.zookeeper.server.NIOServerCnxn: > Creating new session 0x12776cfdefd0003 > 2010-03-19 10:29:18,516 INFO org.apache.zookeeper.server.NIOServerCnxn: > Finished init of 0x12776cfdefd0003 valid:true > [...] > > So some connection handshake is established. Then, nothing. It just hangs. > > Any suggestions on where to go from here or am I fighting a losing battle? > > Thank you in advance for your time. > > -GS >
-
Re: Remote Java client connection into EC2 instanceAndrew Purtell 2010-03-19, 17:16
Expanding on my point #3, if you run your own DNS that accepts updates, you can use nsupdate to maintain a dynamic shadow of the internal zone with mappings to public IPs. Update records when the cluster is up. Remove them when the cluster is terminated.
You would also need to figure out how best to update should an instance fail and be replaced, but this should be hopefully a rare event and elastic IPs can help, though each account only gets 5 of them without justification to AWS. - Andy On Fri Mar 19th, 2010 9:45 AM PDT Andrew Purtell wrote: >The IP addresses assigned on the cluster are all internal ones, so when the regionservers do a reverse lookup, they get something foo.internal. Then they report this to the master, which hands them out to the client library as region locations. So while you can telnet to 60020 on the slaves as you know the public DNS names, the client library is only able to learn of the internal ones. > >Some options: > >1) Run your clients up in the EC2 cloud also > >2) Use a connector like Stargate or the Thrift server which can in effect proxy your requests to the EC2 hosted cluster. > >3) Grab the latest scripts from 0.20 branch in SVN. In $HOME/.hbase-<cluster>-instances will be the list of instance identifiers of the slaves. Do: > > ec2-describe-instances `cat ~/.hbase-<cluster>-instances` | grep INSTANCE | grep running | awk '{print "$4 $5"}' > >This will give you a mapping between private and public names. Dump entries into your /etc/hosts which map public IP (use dig to look up) to private name. Yes, it's not a nice hack. > >4) You can use SSH as a SOCKS 5 proxy (ssh -f -N -D <local-port> <remote>), which will also forward DNS requests, but to do it that way you'd have to hack the client library some. > > - Andy > >> From: George Stathis >> Subject: Remote Java client connection into EC2 instance >> To: [EMAIL PROTECTED] >> Date: Friday, March 19, 2010, 8:00 AM >> This has come up >> before<http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/%[EMAIL PROTECTED]%3E>but >> I'm still unclear as to whether this is possible or not: >> remotely connecting to an EC2 instance using the Java client >> library. >[...] >> Now, I have gone though a lot of threads and posts and have >> opened up all required ports (I think) on EC2: 60000, 60020 >> and 2181 (I can telnet into them). I have one test EC2 >> instance running in pseudo-distributed mode to >> test the remote connection. I attempt to run a single unit >> test. >[...] > > > >
-
Re: Remote Java client connection into EC2 instanceGeorge Stathis 2010-03-19, 18:21
Andy, thanks for the response.
Switching connectors is a significant DAO re-write for us, so that's out, plus we would not use connectors for production. The DNS approach is probably out as well as I don't think we have that much control with the ISP that hosts our build server (we are a small startup, so we are on the cheap right now). Re-writing the Java client is not an option either. So I guess moving the build machine to EC2 might be the best option for us. This definitely helps to move on. Thanks again for taking the time. -GS On Fri, Mar 19, 2010 at 1:16 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Expanding on my point #3, if you run your own DNS that accepts updates, you > can use nsupdate to maintain a dynamic shadow of the internal zone with > mappings to public IPs. Update records when the cluster is up. Remove them > when the cluster is terminated. > > You would also need to figure out how best to update should an instance > fail and be replaced, but this should be hopefully a rare event and elastic > IPs can help, though each account only gets 5 of them without justification > to AWS. > > - Andy > > On Fri Mar 19th, 2010 9:45 AM PDT Andrew Purtell wrote: > > >The IP addresses assigned on the cluster are all internal ones, so when > the regionservers do a reverse lookup, they get something foo.internal. Then > they report this to the master, which hands them out to the client library > as region locations. So while you can telnet to 60020 on the slaves as you > know the public DNS names, the client library is only able to learn of the > internal ones. > > > >Some options: > > > >1) Run your clients up in the EC2 cloud also > > > >2) Use a connector like Stargate or the Thrift server which can in effect > proxy your requests to the EC2 hosted cluster. > > > >3) Grab the latest scripts from 0.20 branch in SVN. In > $HOME/.hbase-<cluster>-instances will be the list of instance identifiers of > the slaves. Do: > > > > ec2-describe-instances `cat ~/.hbase-<cluster>-instances` | grep > INSTANCE | grep running | awk '{print "$4 $5"}' > > > >This will give you a mapping between private and public names. Dump > entries into your /etc/hosts which map public IP (use dig to look up) to > private name. Yes, it's not a nice hack. > > > >4) You can use SSH as a SOCKS 5 proxy (ssh -f -N -D <local-port> > <remote>), which will also forward DNS requests, but to do it that way you'd > have to hack the client library some. > > > > - Andy > > > >> From: George Stathis > >> Subject: Remote Java client connection into EC2 instance > >> To: [EMAIL PROTECTED] > >> Date: Friday, March 19, 2010, 8:00 AM > >> This has come up > >> before< > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/%[EMAIL PROTECTED]%3E > >but > >> I'm still unclear as to whether this is possible or not: > >> remotely connecting to an EC2 instance using the Java client > >> library. > >[...] > >> Now, I have gone though a lot of threads and posts and have > >> opened up all required ports (I think) on EC2: 60000, 60020 > >> and 2181 (I can telnet into them). I have one test EC2 > >> instance running in pseudo-distributed mode to > >> test the remote connection. I attempt to run a single unit > >> test. > >[...] > > > > > > > > > > > > > >
-
Re: Remote Java client connection into EC2 instanceGeorge Stathis 2010-03-19, 22:12
Well, here another possible option: Rackspace. If EC2 is such a big headache
to setup for external access, would Rackspace be a good alternative (cost allowing)? What's people's experience running on it? It seems that they support non-NAT public IPs: http://www.rackspacecloud.com/cloud_hosting_products/servers/compare -GS On Fri, Mar 19, 2010 at 2:21 PM, George Stathis <[EMAIL PROTECTED]> wrote: > Andy, thanks for the response. > > Switching connectors is a significant DAO re-write for us, so that's out, > plus we would not use connectors for production. The DNS approach is > probably out as well as I don't think we have that much control with the ISP > that hosts our build server (we are a small startup, so we are on the cheap > right now). Re-writing the Java client is not an option either. > > So I guess moving the build machine to EC2 might be the best option for > us. This definitely helps to move on. Thanks again for taking the time. > > -GS > > > > On Fri, Mar 19, 2010 at 1:16 PM, Andrew Purtell <[EMAIL PROTECTED]>wrote: > >> Expanding on my point #3, if you run your own DNS that accepts updates, >> you can use nsupdate to maintain a dynamic shadow of the internal zone with >> mappings to public IPs. Update records when the cluster is up. Remove them >> when the cluster is terminated. >> >> You would also need to figure out how best to update should an instance >> fail and be replaced, but this should be hopefully a rare event and elastic >> IPs can help, though each account only gets 5 of them without justification >> to AWS. >> >> - Andy >> >> On Fri Mar 19th, 2010 9:45 AM PDT Andrew Purtell wrote: >> >> >The IP addresses assigned on the cluster are all internal ones, so when >> the regionservers do a reverse lookup, they get something foo.internal. Then >> they report this to the master, which hands them out to the client library >> as region locations. So while you can telnet to 60020 on the slaves as you >> know the public DNS names, the client library is only able to learn of the >> internal ones. >> > >> >Some options: >> > >> >1) Run your clients up in the EC2 cloud also >> > >> >2) Use a connector like Stargate or the Thrift server which can in effect >> proxy your requests to the EC2 hosted cluster. >> > >> >3) Grab the latest scripts from 0.20 branch in SVN. In >> $HOME/.hbase-<cluster>-instances will be the list of instance identifiers of >> the slaves. Do: >> > >> > ec2-describe-instances `cat ~/.hbase-<cluster>-instances` | grep >> INSTANCE | grep running | awk '{print "$4 $5"}' >> > >> >This will give you a mapping between private and public names. Dump >> entries into your /etc/hosts which map public IP (use dig to look up) to >> private name. Yes, it's not a nice hack. >> > >> >4) You can use SSH as a SOCKS 5 proxy (ssh -f -N -D <local-port> >> <remote>), which will also forward DNS requests, but to do it that way you'd >> have to hack the client library some. >> > >> > - Andy >> > >> >> From: George Stathis >> >> Subject: Remote Java client connection into EC2 instance >> >> To: [EMAIL PROTECTED] >> >> Date: Friday, March 19, 2010, 8:00 AM >> >> This has come up >> >> before< >> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/%[EMAIL PROTECTED]%3E >> >but >> >> I'm still unclear as to whether this is possible or not: >> >> remotely connecting to an EC2 instance using the Java client >> >> library. >> >[...] >> >> Now, I have gone though a lot of threads and posts and have >> >> opened up all required ports (I think) on EC2: 60000, 60020 >> >> and 2181 (I can telnet into them). I have one test EC2 >> >> instance running in pseudo-distributed mode to >> >> test the remote connection. I attempt to run a single unit >> >> test. >> >[...] >> > >> > >> > >> > >> >> >> >> >> >> >
-
Re: Remote Java client connection into EC2 instanceAndrew Purtell 2010-03-20, 21:01
The last time I checked the Rackspace instance types had less disk. For example the 8192 MB option has 320 GB of disk. For roughly the same price and RAM on EC2 you get 840 GB of instance storage (m1.large). Presumably for a HBase/Hadoop deployment, storage capacity is a top concern.
And note that you have an access headache really only because you eschew connectors (either Stargate (REST) or the Thrift one). - Andy > From: George Stathis <[EMAIL PROTECTED]> > Subject: Re: Remote Java client connection into EC2 instance > To: [EMAIL PROTECTED] > Date: Friday, March 19, 2010, 3:12 PM > > Well, here another possible option: Rackspace. If EC2 is such a big > headache to setup for external access
-
Re: Remote Java client connection into EC2 instanceAndrew Purtell 2010-03-20, 21:20
Something you might want to look into is the EC2 scripts on Hadoop core trunk (0.21-dev). These have moved beyond bash scripts tied to the EC2 command line tools to a set of Python scripts which use libcloud to abstract away the infrastructure mechanics. So it's about equal effort to deploy a Hadoop cluster using those scripts on EC2, or Rackspace, or ...
Modifying those scripts to include HBase into the Hadoop images, start the master on the namenode, the regionservers on the datanodes, and figuring out how to manage a separate ZK quorum ensemble might not be too difficult. Or, running a single ZK instance on the master might be sufficient. This is the direction I want to take our EC2 scripts. Hopefully I can find time to work on this with Tom White soon. - Andy > From: George Stathis <[EMAIL PROTECTED]> > Subject: Re: Remote Java client connection into EC2 instance > To: [EMAIL PROTECTED] > Date: Friday, March 19, 2010, 3:12 PM > > Well, here another possible option: Rackspace.
-
Re: Remote Java client connection into EC2 instanceGeorge Stathis 2010-03-20, 23:11
Very true. My assumption is that those connectors introduce some level of
overhead over the Java Client even though they are thin clients. Also, I understand that: 1) Stargate is still being developed and may not be quite ready for production until 0.21 - Andy, since you are the main committer, could you share your thoughts regarding using it for production purposes (possible performance issues over Java Client, etc)? 2) Thrift is more mature (currently used at Stumbleupon) but is lagging in features and is not supported by any core devs These were the main reasons why I decided to not use the connectors for our DAO layer. I'm new to this particular stack, so if I'm wrong or missed something major, I'm definitely open to suggestions regarding these connectors. For now, we may just use Rackspace for automated builds and stick with EC2 for production due to the storage benefits. We may also just move our CI server to EC2 and get this over with. -GS On Sat, Mar 20, 2010 at 5:01 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > The last time I checked the Rackspace instance types had less disk. For > example the 8192 MB option has 320 GB of disk. For roughly the same price > and RAM on EC2 you get 840 GB of instance storage (m1.large). Presumably for > a HBase/Hadoop deployment, storage capacity is a top concern. > > And note that you have an access headache really only because you eschew > connectors (either Stargate (REST) or the Thrift one). > > - Andy > > > From: George Stathis <[EMAIL PROTECTED]> > > Subject: Re: Remote Java client connection into EC2 instance > > To: [EMAIL PROTECTED] > > Date: Friday, March 19, 2010, 3:12 PM > > > > Well, here another possible option: Rackspace. If EC2 is such a big > > headache to setup for external access > > > > >
-
Re: Remote Java client connection into EC2 instanceAndrew Purtell 2010-03-21, 19:10
Hi George,
I understand your concerns and they are for sure valid. > My assumption is that those connectors introduce some level of > overhead over the Java Client even though they are thin clients. Yes. Using a REST interface for example will add the overhead of HTTP transactions on each resource. So you should use HTTP/1.1 persistent connections and try hard to batch up operations: scanners, and Stargate multi-put. I think Stargate is acceptable for production use, except for the multiuser stuff which is disabled by default. Jetty, Jersey, and the servlet are in my experience all fast and stable. The Thrift connector in contrast is a streaming interface. The performance you can achieve with it is I would expect within 5% of what you can get with the native Java API with the same pattern of use. We all support the Thrift connector, it is just not under active development. For your case I was thinking more the Thrift connector. > We may also just move our CI server to EC2 and get this > over with. The Hudson CI has an EC2 cloud provider plugin: http://wiki.hudson-ci.org/display/HUDSON/Amazon+EC2+Plugin - Andy
-
Re: Remote Java client connection into EC2 instanceGeorge Stathis 2010-03-21, 23:10
Thanks Andy. It just so happens that we are using Hudson as our CI, so we
may take a good look at that EC2 plug-in. -GS On Sun, Mar 21, 2010 at 3:10 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Hi George, > > I understand your concerns and they are for sure valid. > > > My assumption is that those connectors introduce some level of > > overhead over the Java Client even though they are thin clients. > > Yes. Using a REST interface for example will add the overhead of > HTTP transactions on each resource. So you should use HTTP/1.1 > persistent connections and try hard to batch up operations: > scanners, and Stargate multi-put. I think Stargate is acceptable > for production use, except for the multiuser stuff which is > disabled by default. Jetty, Jersey, and the servlet are in my > experience all fast and stable. > > The Thrift connector in contrast is a streaming interface. The > performance you can achieve with it is I would expect within 5% > of what you can get with the native Java API with the same > pattern of use. We all support the Thrift connector, it is just > not under active development. > > For your case I was thinking more the Thrift connector. > > > We may also just move our CI server to EC2 and get this > > over with. > > The Hudson CI has an EC2 cloud provider plugin: > > http://wiki.hudson-ci.org/display/HUDSON/Amazon+EC2+Plugin > > - Andy > > > > > > |