|
Doug Cutting
2009-09-11, 21:41
Scott Carey
2009-09-26, 00:36
Doug Cutting
2009-09-28, 17:01
Owen O'Malley
2009-09-28, 17:59
Sanjay Radia
2009-09-28, 20:13
Doug Cutting
2009-09-28, 22:42
Sanjay Radia
2009-09-29, 18:52
Doug Cutting
2009-09-29, 19:43
stack
2009-09-29, 20:38
Doug Cutting
2009-09-29, 21:08
stack
2009-09-29, 21:57
Raghu Angadi
2009-09-29, 22:11
Doug Cutting
2009-09-29, 23:17
Doug Cutting
2009-09-29, 23:35
Devaraj Das
2009-09-29, 23:57
Scott Carey
2009-09-30, 01:37
Scott Carey
2009-09-30, 03:06
Ryan Rawson
2009-09-30, 03:20
Sanjay Radia
2009-09-30, 23:04
Sanjay Radia
2009-10-05, 16:41
Eric Sammer
2009-10-05, 20:43
Ryan Rawson
2009-10-05, 20:47
Eric Sammer
2009-10-05, 20:53
Ryan Rawson
2009-10-05, 20:57
Eric Sammer
2009-10-05, 21:13
Doug Cutting
2009-10-05, 23:48
Scott Carey
2009-10-06, 02:59
Eric Sammer
2009-10-06, 03:15
Scott Carey
2009-10-06, 03:19
Scott Carey
2009-10-06, 03:30
Owen O'Malley
2009-10-08, 22:10
Doug Cutting
2009-10-09, 17:49
Sanjay Radia
2009-10-09, 18:13
Doug Cutting
2009-10-09, 19:56
Scott Carey
2009-10-11, 01:11
Kan Zhang
2009-10-14, 01:59
Doug Cutting
2009-10-14, 16:37
Kan Zhang
2009-10-14, 17:45
Kan Zhang
2009-11-06, 19:15
Doug Cutting
2009-11-06, 21:06
Patrick Hunt
2009-11-12, 22:22
|
-
HTTP transport?Doug Cutting 2009-09-11, 21:41
I'm considering an HTTP-based transport for Avro as the preferred,
high-performance option. HTTP has lots of advantages. In particular, it already has - lots of authentication, authorization and encryption support; - highly optimized servers; - monitoring, logging, etc. Tomcat and other servlet containers support async NIO, where a thread is not required per connection. A servlet can process bulk data with a single copy to and from the socket (bypassing stream buffers). Calls can be multiplexed over a single HTTP connection using Comet events. http://tomcat.apache.org/tomcat-6.0-doc/aio.html Zero copy is not an option for servlets that generate arbitrary data, but one can specify a file/start/length tuple and Tomcat will use sendfile to write the response. That means that while HDFS datanode file reads could not be done via RPC, they could be done via HTTP with zero-copy. If authentication and authorization are already done in the HTTP server, this may not be a big loss. The HDFS client might make two HTTP requests, one to read a files data, and another to read its checksums. The server would then stream the entire block to the client using sendfile, using TCP flow control as today. Thoughts? Doug
-
Re: HTTP transport?Scott Carey 2009-09-26, 00:36
Ok, I have some thoughts on this. I might be misinterpreting some use cases here however.
HTTP is very useful and typically performs very well. It has lots of things built-in too. In addition to what you mention, it has a caching mechanism built-in, range queries, and all sorts of ways to tag along state if needed. To top it off there are a lot of testing and debugging tools available for it. So from that front using it is very attractive. However, In my experience zero-copy is not going to be much of a gain performance-wise for this sort of application, and will limit what can be done. As long as a servlet doesn't transcode data and mostly copies, it will be very fast - many multiples of gigabit ethernet speed per CPU - far more than most disk setups will handle for a while. Furthermore, it is easier to optimize disk requests to be 'sequentially chunky' if it goes through the JVM. And I suspect that for many use cases, optimizing disk I/O is more valuable than a little bit of extra CPU spent copying data into and out of the process. Additionally, I'm not sure CRC checking should occur on the client. TCP/IP already checksums packets, so network data corruption over HTTP is not a primary concern. The big concern is silent data corruption on the disk. For the DataNode use case, it should find such errors as early as possible, and not rely on clients discovering errors. Then it can coordinate with the NameNode on fixing the block or discarding it. So if it has to check the file integrity anyway, there is no reason to worry about zero-copy. Avoiding the extra request for the CRC data at least partially counters the loss of zero-copy. Additionally, embedding Tomcat tends to be more tricky than Jetty, though that can be overcome. One might argue that we don't even want a servlet container, we just want an HTTP connector. The Servlet API is familiar, but for a high performance transport it might just be overhead and restrictive. Direct access to Tomcat's NIO connector might be significantly lighter-weight and more flexible. Tomcat's NIO connector implementation works great and I have had great success with up to 10K connections with the pure Java connector using ordinary byte buffers and about 20 servlet threads. But if a large number of open connections are not needed (less than about 5x the number of CPU core threads) then thread-per-connection servlet containers work ok too. These sort of implementation details can evolve over time however. Just my 2c -Scott On 9/11/09 2:41 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: I'm considering an HTTP-based transport for Avro as the preferred, high-performance option. HTTP has lots of advantages. In particular, it already has - lots of authentication, authorization and encryption support; - highly optimized servers; - monitoring, logging, etc. Tomcat and other servlet containers support async NIO, where a thread is not required per connection. A servlet can process bulk data with a single copy to and from the socket (bypassing stream buffers). Calls can be multiplexed over a single HTTP connection using Comet events. http://tomcat.apache.org/tomcat-6.0-doc/aio.html Zero copy is not an option for servlets that generate arbitrary data, but one can specify a file/start/length tuple and Tomcat will use sendfile to write the response. That means that while HDFS datanode file reads could not be done via RPC, they could be done via HTTP with zero-copy. If authentication and authorization are already done in the HTTP server, this may not be a big loss. The HDFS client might make two HTTP requests, one to read a files data, and another to read its checksums. The server would then stream the entire block to the client using sendfile, using TCP flow control as today. Thoughts? Doug
-
Re: HTTP transport?Doug Cutting 2009-09-28, 17:01
Scott Carey wrote:
> HTTP is very useful and typically performs very well. It has lots of > things built-in too. In addition to what you mention, it has a > caching mechanism built-in, range queries, and all sorts of ways to > tag along state if needed. To top it off there are a lot of testing > and debugging tools available for it. So from that front using it is > very attractive. Glad you agree! > However, In my experience zero-copy is not going to be much of a gain > performance-wise for this sort of application, and will limit what > can be done. As long as a servlet doesn't transcode data and mostly > copies, it will be very fast - many multiples of gigabit ethernet > speed per CPU - far more than most disk setups will handle for a > while. In MapReduce, datanodes are also running map and reduce tasks, so we'd like it if datanodes not only keep up with disks and networks, but also use minimal CPU to do so. Zerocopy on the datanode has been shown to help significantly MapReduce benchmarks. That said, zero copy may or may not be significantly better than one-copy. I intend to benchmark that. But the important thing to measure is not just throughput but also idle CPU. > Additionally, I'm not sure CRC checking should occur on the > client. TCP/IP already checksums packets, so network data corruption > over HTTP is not a primary concern. The big concern is silent data > corruption on the disk. I believe that disks are the largest source of data corruption, but I am not confident they are the only source. HDFS uses end-to-end checksums. As data is written to HDFS it is immediately checksummed on the client. This checksum then lives with the data and is validated on the client immediately before the data is returned to the application. The goal is to catch corruption wherever it may occur, on disks, on the network, or while buffered in memory. In addition, the checksum is validated after data is transmitted to datanodes but before before blocks are stored, so that initial network and memory corruptions are caught early and the writing process fails, rather than permitting an application to write corrupt data. Finally, datanodes periodically scan for corrupt blocks on disks, replacing them with non-corrupt replicas, decreasing the chance that over time all replicas become corrupt. > Additionally, embedding Tomcat tends to be more tricky than Jetty, > though that can be overcome. One might argue that we don't even want > a servlet container, we just want an HTTP connector. The Servlet API > is familiar, but for a high performance transport it might just be > overhead and restrictive. Direct access to Tomcat's NIO connector > might be significantly lighter-weight and more flexible. Tomcat's NIO > connector implementation works great and I have had great success > with up to 10K connections with the pure Java connector using > ordinary byte buffers and about 20 servlet threads. I hope to start benchmarking bulk data RPC over the next few weeks. I'll probably start with a servlet using Jetty, then see if I can increase throughput and decrease CPU utilization through the use of things like Tomcat's NIO connector, Grizzly, etc. Doug
-
Re: HTTP transport?Owen O'Malley 2009-09-28, 17:59
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote: > I'm considering an HTTP-based transport for Avro as the preferred, > high-performance option. I've got concerns about this. Both tactical and strategic. The tactical problem is that I need to get security (both Kerberos and token) in to 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be done roughly in 5 months. If you switch off of the current RPC code base to a completely new RPC code base, I don't see that happening. > HTTP has lots of advantages. In particular, it already has > - lots of authentication, authorization and encryption support; > - highly optimized servers; > - monitoring, logging, etc. It also has a couple of disadvantages: - poor integration with kerberos - very expensive on the wire encryption (ssl) > Tomcat and other servlet containers support async NIO, where a > thread is not required per connection. I'm also concerned about the weight of Tomcat. Everything I've read about it says that it take a lot more memory and cpu than Jetty. I think a solution that requires Tomcat may be problematic... -- Owen
-
Re: HTTP transport?Sanjay Radia 2009-09-28, 20:13
On Sep 11, 2009, at 2:41 PM, Doug Cutting wrote: > I'm considering an HTTP-based transport for Avro as the preferred, > high-performance option. > > HTTP has lots of advantages. In particular, it already has > - lots of authentication, authorization and encryption support; > - highly optimized servers; > - monitoring, logging, etc. > Q. Is this to replace the client-DN data-transfer protocol or for ALL Hadoop rpc? Q. Was authentication one of your main motivation? The current plans for authentication is centered around kerberos. HTTP does not fit in too well in that picture. sanjay > > Tomcat and other servlet containers support async NIO, where a > thread is > not required per connection. A servlet can process bulk data with a > single copy to and from the socket (bypassing stream buffers). Calls > can be multiplexed over a single HTTP connection using Comet events. > > http://tomcat.apache.org/tomcat-6.0-doc/aio.html > > Zero copy is not an option for servlets that generate arbitrary data, > but one can specify a file/start/length tuple and Tomcat will use > sendfile to write the response. That means that while HDFS datanode > file reads could not be done via RPC, they could be done via HTTP with > zero-copy. If authentication and authorization are already done in > the > HTTP server, this may not be a big loss. The HDFS client might make > two > HTTP requests, one to read a files data, and another to read its > checksums. The server would then stream the entire block to the > client > using sendfile, using TCP flow control as today. > > Thoughts? > > Doug >
-
Re: HTTP transport?Doug Cutting 2009-09-28, 22:42
Owen O'Malley wrote:
> I've got concerns about this. Both tactical and strategic. The tactical > problem is that I need to get security (both Kerberos and token) in to > 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be > done roughly in 5 months. If you switch off of the current RPC code base > to a completely new RPC code base, I don't see that happening. What transport do you expect to use with Avro? If wire-compatibility is a goal, and that includes access from languages besides Java, then we must use a transport that's well-specified and Java-independent. HTTP is both of these. The existing Hadoop RPC protocol is not. We could adapting Hadoop's existing RPC transport to be well-specified and language independent. This is perhaps not a huge task, but it feels to me a bit like re-inventing much of what's already in HTTP clients and servers these days: connection-pooling, async servers, etc. Plus we take on the onus of fully specifying the transport, so that it may be implemented in other languages, and we need to provide some alternate implementations to demonstrate this. Do you feel our existing RPC framework's transport is actually more scalable and reliable than, say, Jetty? Do you think it would be substantially harder to add, e.g., token-based security to Jetty than to a homegrown server? > [ HTTP ] also has a couple of disadvantages: > - poor integration with kerberos Do you think it would be substantially harder to integrate Kerberos with Jetty than with a homegrown protocol and server? > - very expensive on the wire encryption (ssl) If we don't use HTTP, will we be providing on-wire encryption? If not, this is moot. Finally, need to have secure HTTP-based access anyway, right? If we use HTTP as our RPC transport mightn't we reuse most of that effort? Doug
-
Re: HTTP transport?Sanjay Radia 2009-09-29, 18:52
On Sep 28, 2009, at 3:42 PM, Doug Cutting wrote: > Owen O'Malley wrote: > > I've got concerns about this. Both tactical and strategic. The > tactical > > problem is that I need to get security (both Kerberos and token) > in to > > 0.22. I'd really like to get Avro RPC into 0.22. I'd like both to be > > done roughly in 5 months. If you switch off of the current RPC > code base > > to a completely new RPC code base, I don't see that happening. > > What transport do you expect to use with Avro? If wire- > compatibility is > a goal, and that includes access from languages besides Java, then we > must use a transport that's well-specified and Java-independent. HTTP > is both of these. The existing Hadoop RPC protocol is not. > > We could adapting Hadoop's existing RPC transport to be well-specified > and language independent. This is perhaps not a huge task, but it > feels > to me a bit like re-inventing much of what's already in HTTP clients > and > servers these days: connection-pooling, async servers, etc. > Wrt connection pooling/async servers: Can't we use the same libraries that Jetty and Tomcat use? Grizzly? > grate Kerberos with > Jetty than with a homegrown protocol and server? > > > > - very expensive on the wire encryption (ssl) > > If we don't use HTTP, will we be providing on-wire encryption? If > not, > this is moot. > Yes we are expecting to use encryption down the road. > > Finally, need to have secure HTTP-based access anyway, right? If we > use > HTTP as our RPC transport mightn't we reuse most of that effort? > > Doug >
-
Re: HTTP transport?Doug Cutting 2009-09-29, 19:43
Sanjay Radia wrote:
> Wrt connection pooling/async servers: Can't we use the same libraries > that Jetty and Tomcat use? > Grizzly? Grizzly also supports HTTP. Choosing Grizzly is independent of choosing HTTP as a wire transport or choosing a server. The question I'm asking now is about the wire format, whether we wish to precede each RPC request with something like "GET /avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with "HTTP/1.1 200 OK\n", plus a couple of other headers in each case (e.g., Content-Type and Content-Length). I think there are great benefits to using a single, standard protocol on the wire. Which server and client implementations we use will be determined by performance, features, etc. But using a standard wire format will greatly simplify things as we attempt to support multiple languages. Since we want to provide browser access, we're compelled to support HTTP. So the question is, are there compelling reasons why HTTP should not be used for other, non-browser, access? > Yes we are expecting to use encryption down the road. Do we expect to use something different from TLS? With its 'resume' feature, is TLS performance unacceptable? Would we implement some other encryption protocol, or use a non-standards-based encryption protocol? Doug
-
Re: HTTP transport?stack 2009-09-29, 20:38
On Tue, Sep 29, 2009 at 12:43 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> > The question I'm asking now is about the wire format, whether we wish to > precede each RPC request with something like "GET > /avro/org.apache.hadoop.hdfs.NameNode HTTP/1.1\n" and each response with > "HTTP/1.1 200 OK\n", plus a couple of other headers in each case (e.g., > Content-Type and Content-Length). I think there are great benefits to using > a single, standard protocol on the wire. Which server and client > implementations we use will be determined by performance, features, etc. > But using a standard wire format will greatly simplify things as we attempt > to support multiple languages. Since we want to provide browser access, > we're compelled to support HTTP. So the question is, are there compelling > reasons why HTTP should not be used for other, non-browser, access? I like the idea of using a proven transport. The HTTP request and response header verbiage seems profligate if whats being passed is small. What do you think the path on the first line look like? Will it be a method name or will it be customizable? (In hbase, it might be nice to have path be /tablename/row/family/qualifier etc). St.Ack
-
Re: HTTP transport?Doug Cutting 2009-09-29, 21:08
stack wrote:
> What do you think the path on the first line look like? Will it be a method > name or will it be customizable? Avro RPC currently includes the message name in the payload, so, unless that changes, for Avro RPC, we'd probably use a different URL per protocol. As a convention we might use the namespace-qualified protocol name as the URL path. Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull stuff out of Avro's payload into HTTP headers. The downside of that would be that, if we still wish to support non-HTTP transports, we'd end up with duplicated logic. If we fully embraced HTTP as Avro's primary RPC transport then it might make sense to move the message name to the URL and to use the HTTP return code to determine whether the response is an error or not. Avro's RPC payload also currently includes request and response metadata, which are functionally redundant with HTTP headers. > (In hbase, it might be nice to have path be /tablename/row/family/qualifier etc). It sounds like you'd perhaps like to be able to put RPC request parameters into the URL? I don't see that being done automatically in a general way for arbitrary parameter types without the URLs getting really ugly and adding a lot of complexity. For this it might be better to write a servlet filter that constructs the appropriate Avro-format request and forwards it to the RPC url. Doug
-
Re: HTTP transport?stack 2009-09-29, 21:57
On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> > Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull > stuff out of Avro's payload into HTTP headers. The downside of that would > be that, if we still wish to support non-HTTP transports, we'd end up with > duplicated logic. > There would be loads of upside I'd imagine if there was a natural mapping of avro payload specifiers and metadata up into http headers in terms of visibility So, are we're talking about doing something like following for a request/response: GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Date: Mon, 23 May 2005 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Etag: "3f80f-1b6-3e1cb03b" Accept-Ranges: bytes Content-Length: 438 Connection: close Content-Type: X-avro/binary ... or some variation on above on each and every RPC? St.Ack
-
Re: HTTP transport?Raghu Angadi 2009-09-29, 22:11
Doug Cutting wrote:
> stack wrote: >> What do you think the path on the first line look like? Will it be a >> method >> name or will it be customizable? > > Avro RPC currently includes the message name in the payload, so, unless > that changes, for Avro RPC, we'd probably use a different URL per > protocol. As a convention we might use the namespace-qualified protocol > name as the URL path. > > Alternately, we could try to make Avro's RPC more HTTP-friendly, and > pull stuff out of Avro's payload into HTTP headers. The downside of > that would be that, if we still wish to support non-HTTP transports, > we'd end up with duplicated logic. Keeping Avro payload independent of transport seems pretty useful, at least for now. As understand Avro payload is Avro 'proper' (i.e. it issupported in all the languages supported by Avro... and other goodies). I just noticed AVRO-129 and it seems like a great example of using HTTP transport. Does this mean current Avro RPC transport (an improved version of Hadoop RPC) can still exist as long as it supported by developers? Where does security lie : Avro or Transport layer? If it is part of Avro : transport layer does not matter for security. If it is part of transport : How does an app get hold of required information (e.g. user identity). May be 'transceiver' can have an interface that can transfer security information between transport layer and Avro. Raghu. > If we fully embraced HTTP as Avro's primary RPC transport then it might > make sense to move the message name to the URL and to use the HTTP > return code to determine whether the response is an error or not. Avro's > RPC payload also currently includes request and response metadata, which > are functionally redundant with HTTP headers. > >> (In hbase, it might be nice to have path be >> /tablename/row/family/qualifier etc). > > It sounds like you'd perhaps like to be able to put RPC request > parameters into the URL? I don't see that being done automatically in a > general way for arbitrary parameter types without the URLs getting > really ugly and adding a lot of complexity. For this it might be better > to write a servlet filter that constructs the appropriate Avro-format > request and forwards it to the RPC url. > > Doug
-
Re: HTTP transport?Doug Cutting 2009-09-29, 23:17
stack wrote:
> So, are we're talking about doing something like following for a > request/response: > > GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 > Host: www.example.com > > > HTTP/1.1 200 OK > Date: Mon, 23 May 2005 22:38:34 GMT > Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) > Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT > Etag: "3f80f-1b6-3e1cb03b" > Accept-Ranges: bytes > Content-Length: 438 > Connection: close > Content-Type: X-avro/binary > > > ... or some variation on above on each and every RPC? More or less. Except we can probably arrange to omit most of those response headers except Content-Length. Are any others strictly required? I today implemented a simple HTTP-based transport for Avro: https://issues.apache.org/jira/browse/AVRO-129 In some simple benchmarks I am able to make over 5000 sequential RPCs/second, each with ~100 bytes of response payload. Increasing response payloads to 100kB slows this to around 2500 RPCs/second, giving throughput of 250MB/second, or 2.5Gbit/s. This is with both client and server running on my laptop. The client is java.net.URLConnection and the server is Jetty with its default configuration. Doug
-
Re: HTTP transport?Doug Cutting 2009-09-29, 23:35
Raghu Angadi wrote:
> Does this mean current Avro RPC transport (an improved version of Hadoop > RPC) can still exist as long as it supported by developers? Sure, folks can create new transports for Avro. There is, for example, in Hadoop Common some code that tunnels Avro RPCs inside Hadoop RPCs. > Where does security lie : Avro or Transport layer? That's not yet clear. If we settle on HTTP as the preferred transport, then the transport should probably handle security, since many security standards already exist for HTTP and many HTTP servers and clients already support adding new security mechanisms. I'd rather not re-invent all this in Avro if we can avoid it. > If it is part of transport : How does an app get hold of required > information (e.g. user identity). Perhaps the way we currently do this in the RPC server, with thread locals? For example, the Avro RPC servlet could have a static method that returns that returns the value of HttpServletRequest#getUserPrincipal(). > May be 'transceiver' can have an interface that can transfer security > information between transport layer and Avro. Yes, we could add methods like getPrincipal() to Transciever, but we'd still probably need to use a thread local accessed by a static method to get the Transciever if we continue to use reflection for server implementations. Or we could stray from reflection, and make services implement an interface through which we can pass them things like the principal. Doug
-
Re: HTTP transport?Devaraj Das 2009-09-29, 23:57
Out of curiosity, do we have such numbers for the current hadoop RPC?
On 9/29/09 4:17 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: stack wrote: > So, are we're talking about doing something like following for a > request/response: > > GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 > Host: www.example.com > > > HTTP/1.1 200 OK > Date: Mon, 23 May 2005 22:38:34 GMT > Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) > Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT > Etag: "3f80f-1b6-3e1cb03b" > Accept-Ranges: bytes > Content-Length: 438 > Connection: close > Content-Type: X-avro/binary > > > ... or some variation on above on each and every RPC? More or less. Except we can probably arrange to omit most of those response headers except Content-Length. Are any others strictly required? I today implemented a simple HTTP-based transport for Avro: https://issues.apache.org/jira/browse/AVRO-129 In some simple benchmarks I am able to make over 5000 sequential RPCs/second, each with ~100 bytes of response payload. Increasing response payloads to 100kB slows this to around 2500 RPCs/second, giving throughput of 250MB/second, or 2.5Gbit/s. This is with both client and server running on my laptop. The client is java.net.URLConnection and the server is Jetty with its default configuration. Doug
-
Re: HTTP transport?Scott Carey 2009-09-30, 01:37
BTW, java.net.UrlConnection is the likely bottleneck there - it stinks performance-wise. The Apache commons http client is much faster. Try out using Jmeter and switch from one connector to the other for an example.
On 9/29/09 4:17 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: stack wrote: > So, are we're talking about doing something like following for a > request/response: > > GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 > Host: www.example.com > > > HTTP/1.1 200 OK > Date: Mon, 23 May 2005 22:38:34 GMT > Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) > Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT > Etag: "3f80f-1b6-3e1cb03b" > Accept-Ranges: bytes > Content-Length: 438 > Connection: close > Content-Type: X-avro/binary > > > ... or some variation on above on each and every RPC? More or less. Except we can probably arrange to omit most of those response headers except Content-Length. Are any others strictly required? I today implemented a simple HTTP-based transport for Avro: https://issues.apache.org/jira/browse/AVRO-129 In some simple benchmarks I am able to make over 5000 sequential RPCs/second, each with ~100 bytes of response payload. Increasing response payloads to 100kB slows this to around 2500 RPCs/second, giving throughput of 250MB/second, or 2.5Gbit/s. This is with both client and server running on my laptop. The client is java.net.URLConnection and the server is Jetty with its default configuration. Doug
-
Re: HTTP transport?Scott Carey 2009-09-30, 03:06
On 9/29/09 2:57 PM, "stack" <[EMAIL PROTECTED]> wrote: > On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > >> >> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull >> stuff out of Avro's payload into HTTP headers. The downside of that would >> be that, if we still wish to support non-HTTP transports, we'd end up with >> duplicated logic. >> > > > There would be loads of upside I'd imagine if there was a natural mapping of > avro payload specifiers and metadata up into http headers in terms of > visibility > There are some very serious disadvantages to headers if overused. I highly advise keeping what goes into the URL and headers very specific to support well defined features for this specific transport type. Otherwise, put it in the data payload for all transports. A couple header disadvantages: * Limited character set allowed. You can't put any data in there you want, and you can end up with an inefficient encoding mess that is not easy to read. * Headers don't take advantage of other transport features. For example, Content-Encoding:gzip provides gzip compression support for the data payload, but you can't compress the headers in HTTP. On the other hand, Custom headers can be handy ways to implement transport specific handshakes or advertize capabilities (which helps build in cross-version compatibility). But browsers only work with the standard ones, so whatever 'browser requirement' is out there is going to be a limited subset no matter how you do it. This thread brings up the security features. Payload encryption does not seem to be a transport feature -- but it could be done via something like Content-Encoding (X-Avro-Content-Encrypted?). It seems to fit better IMO within the payload itself, or at the socket / network level via SSH or a secure tunnel. Authentication is a better fit for the transport layer -- but as mentioned elsewhere if it has to be done for all transports, could it fit in the payload somehow? > > So, are we're talking about doing something like following for a > request/response: > > GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 > Host: www.example.com > > > HTTP/1.1 200 OK > Date: Mon, 23 May 2005 22:38:34 GMT > Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) > Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT > Etag: "3f80f-1b6-3e1cb03b" > Accept-Ranges: bytes > Content-Length: 438 > Connection: close > Content-Type: X-avro/binary > Its acceptable to drop a lot of the headers above. Some of them are useful to implement extended functionality -- the Etag can be used for caching if that were desired, for example. Keep-Alive connections and chunked responses are nice built-ins too. > > ... or some variation on above on each and every RPC? > > St.Ack >
-
Re: HTTP transport?Ryan Rawson 2009-09-30, 03:20
I wanted to chime in on a few things, since avro is a candidate for
the HBase RPC. I am not sure that "browser compatibility" is a legitimate requirement for this kind of thing. It is at odds with high performance in a number of areas, and isn't the driving factor for using HTTP anyways. Security - you can get the advantage of security standards, such as the X.509 SSL cert, without actually using HTTPS. Headers - I don't really think providing a caching mechanism built into the RPC layer is a top requirement. We'd then have to build in a GET/POST idempotent flag into the Avro IDL, and everyone would have to get it right, etc. Considering my top requirement is "make bulk data access and RPC rate/sec as high as possible", I'm not sure caching fits in here and can work against that. On Tue, Sep 29, 2009 at 8:06 PM, Scott Carey <[EMAIL PROTECTED]> wrote: > > > > On 9/29/09 2:57 PM, "stack" <[EMAIL PROTECTED]> wrote: > >> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> >>> >>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull >>> stuff out of Avro's payload into HTTP headers. Â The downside of that would >>> be that, if we still wish to support non-HTTP transports, we'd end up with >>> duplicated logic. >>> >> >> >> There would be loads of upside I'd imagine if there was a natural mapping of >> avro payload specifiers and metadata up into http headers in terms of >> visibility >> > > There are some very serious disadvantages to headers if overused. > > I highly advise keeping what goes into the URL and headers very specific to > support well defined features for this specific transport type. Â Otherwise, > put it in the data payload for all transports. > > A couple header disadvantages: > * Limited character set allowed. Â You can't put any data in there you want, > and you can end up with an inefficient encoding mess that is not easy to > read. > * Headers don't take advantage of other transport features. Â For example, > Content-Encoding:gzip provides gzip compression support for the data > payload, but you can't compress the headers in HTTP. > > On the other hand, Custom headers can be handy ways to implement transport > specific handshakes or advertize capabilities (which helps build in > cross-version compatibility). > But browsers only work with the standard ones, so whatever 'browser > requirement' is out there is going to be a limited subset no matter how you > do it. > > This thread brings up the security features. Â Payload encryption does not > seem to be a transport feature -- but it could be done via something like > Content-Encoding (X-Avro-Content-Encrypted?). Â It seems to fit better IMO > within the payload itself, or at the socket / network level via SSH or a > secure tunnel. > > Authentication is a better fit for the transport layer -- but as mentioned > elsewhere if it has to be done for all transports, could it fit in the > payload somehow? > >> >> So, are we're talking about doing something like following for a >> request/response: >> >> Â GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1 >> Â Host: www.example.com >> >> >> Â HTTP/1.1 200 OK >> Â Date: Mon, 23 May 2005 22:38:34 GMT >> Â Server: Apache/1.3.3.7 (Unix) Â (Red-Hat/Linux) >> Â Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT >> Â Etag: "3f80f-1b6-3e1cb03b" >> Â Accept-Ranges: bytes >> Â Content-Length: 438 >> Â Connection: close >> Â Content-Type: X-avro/binary >> > > Its acceptable to drop a lot of the headers above. Â Some of them are useful > to implement extended functionality -- the Etag can be used for caching if > that were desired, for example. Â Keep-Alive connections and chunked > responses are nice built-ins too. > >> >> ... or some variation on above on each and every RPC? >> >> St.Ack >> > >
-
Re: HTTP transport?Sanjay Radia 2009-09-30, 23:04
On Sep 29, 2009, at 2:08 PM, Doug Cutting wrote: > ... > > Alternately, we could try to make Avro's RPC more HTTP-friendly, and > pull stuff out of Avro's payload into HTTP headers. The downside of > that would be that, if we still wish to support non-HTTP transports, > we'd end up with duplicated logic. > I would prefer to retain layer independence so that we can use other transports. (I am still not sold on HTTP as a transport so far but am listening with an open mind). > >
-
Re: HTTP transport?Sanjay Radia 2009-10-05, 16:41
On Sep 29, 2009, at 12:43 PM, Doug Cutting wrote: > Sanjay Radia wrote: > > Wrt connection pooling/async servers: Can't we use the same > libraries > > that Jetty and Tomcat use? > > Grizzly? > > Grizzly also supports HTTP. Choosing Grizzly is independent of > choosing > HTTP as a wire transport or choosing a server. > Agreed. Hence the main advantages that remain for http transport are 1) language independent spec for the protocol. The message headers will be in avro so that is easy and the message exchange should be fairly straightforward. I see this as a minor advantage for using http transport. 2) code to implement the transport in multiple languages. (2) is a significant advantage. Once we put in the security modifications, will it remain that portable? We should look at that more closely. What about out of order exchange. Will we be able to support that with http transport? sanjay
-
Re: HTTP transport?Eric Sammer 2009-10-05, 20:43
Doug Cutting wrote:
> More or less. Except we can probably arrange to omit most of those > response headers except Content-Length. Are any others strictly required? Content-Type and Server are probably unavoidable. Some of the others are extremely helpful during development / debugging / etc. It depends on how "open" you are about HTTP being the transport (i.e. do you let developers augment these headers to support additional features, etc.). This may not make sense in the context of something specialized like Avro transport. > I today implemented a simple HTTP-based transport for Avro: > > https://issues.apache.org/jira/browse/AVRO-129 > > In some simple benchmarks I am able to make over 5000 sequential > RPCs/second, each with ~100 bytes of response payload. Just out of curiousity, were you using HTTP keep alive? During testing on a project a few years ago, I found a huge difference if Keep Alive is supported. In retrospect, that should have been obvious. I'd imagine the usage pattern here would be a large number of repeated calls between the same client / server within a short period of time; perfect for KA. Regards. -- Eric Sammer [EMAIL PROTECTED] http://esammer.blogspot.com
-
Re: HTTP transport?Ryan Rawson 2009-10-05, 20:47
I have a question about these headers... will they impact the ability to do
many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000 rpcs/second. Would this help or hinder? On Oct 5, 2009 4:44 PM, "Eric Sammer" <[EMAIL PROTECTED]> wrote: Doug Cutting wrote: > More or less. Except we can probably arrange to omit most of those > response... Content-Type and Server are probably unavoidable. Some of the others are extremely helpful during development / debugging / etc. It depends on how "open" you are about HTTP being the transport (i.e. do you let developers augment these headers to support additional features, etc.). This may not make sense in the context of something specialized like Avro transport. > I today implemented a simple HTTP-based transport for Avro: > > https://issues.apache.org/jira... Just out of curiousity, were you using HTTP keep alive? During testing on a project a few years ago, I found a huge difference if Keep Alive is supported. In retrospect, that should have been obvious. I'd imagine the usage pattern here would be a large number of repeated calls between the same client / server within a short period of time; perfect for KA. Regards. -- Eric Sammer [EMAIL PROTECTED] http://esammer.blogspot.com
-
Re: HTTP transport?Eric Sammer 2009-10-05, 20:53
Ryan:
Certainly keep alive will help in this case, if that's what you're referring to. The server holds the socket for N seconds or M requests, which ever comes first. What you're saving with KA is the connection setup / tear down. If you have a lot of cases where the client makes a single request and goes away, then KA hurts because the server holds the connection for the KA timeout (N seconds). This *really* helps if you're using TLS due to the additional connection setup overhead. It's my opinion and experience that KA helps greatly in the case of many exchanges between a small to medium number of clients and a server such as RPC. The anti-example is an ad server or web beacon server, for instance. Regards. Ryan Rawson wrote: > I have a question about these headers... will they impact the ability to do > many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000 > rpcs/second. Would this help or hinder? > > On Oct 5, 2009 4:44 PM, "Eric Sammer" <[EMAIL PROTECTED]> wrote: > > Doug Cutting wrote: > More or less. Except we can probably arrange to omit > most of those > response... > Content-Type and Server are probably unavoidable. Some of the others are > extremely helpful during development / debugging / etc. It depends on > how "open" you are about HTTP being the transport (i.e. do you let > developers augment these headers to support additional features, etc.). > This may not make sense in the context of something specialized like > Avro transport. > >> I today implemented a simple HTTP-based transport for Avro: > > > https://issues.apache.org/jira... > Just out of curiousity, were you using HTTP keep alive? During testing > on a project a few years ago, I found a huge difference if Keep Alive is > supported. In retrospect, that should have been obvious. I'd imagine the > usage pattern here would be a large number of repeated calls between the > same client / server within a short period of time; perfect for KA. > > Regards. > -- > Eric Sammer > [EMAIL PROTECTED] > http://esammer.blogspot.com > -- Eric Sammer [EMAIL PROTECTED] http://esammer.blogspot.com
-
Re: HTTP transport?Ryan Rawson 2009-10-05, 20:57
That's good to know. I thought ka would help... but I was also talking about
the overhead of a header where the payload is smaller than the framing. Eg: 8 byte requests, excluding which rpc. This seems like we could be hurt since the headers are potentially 5x the size of our payload/request params. On Oct 5, 2009 4:54 PM, "Eric Sammer" <[EMAIL PROTECTED]> wrote: Ryan: Certainly keep alive will help in this case, if that's what you're referring to. The server holds the socket for N seconds or M requests, which ever comes first. What you're saving with KA is the connection setup / tear down. If you have a lot of cases where the client makes a single request and goes away, then KA hurts because the server holds the connection for the KA timeout (N seconds). This *really* helps if you're using TLS due to the additional connection setup overhead. It's my opinion and experience that KA helps greatly in the case of many exchanges between a small to medium number of clients and a server such as RPC. The anti-example is an ad server or web beacon server, for instance. Regards. Ryan Rawson wrote: > I have a question about these headers... will they impact the ability to do > ... -- Eric Sammer [EMAIL PROTECTED] http://esammer.blogspot.com
-
Re: HTTP transport?Eric Sammer 2009-10-05, 21:13
Ryan Rawson wrote:
> That's good to know. I thought ka would help... but I was also talking about > the overhead of a header where the payload is smaller than the framing. Eg: > 8 byte requests, excluding which rpc. This seems like we could be hurt since > the headers are potentially 5x the size of our payload/request params. > Oh, got you. That's the classic SOAP problem. ;) I think it's possible, but to what degree I couldn't be sure. HTTP has that kind of overhead because of the generality. I think you'd get that with anything that isn't specifically designed to be wire efficient. Of course, you wind up having to do what Doug originally mentioned; rebuilding and maintaining the original stuff that HTTP (and the supporting clients) already supports. If over-optimization is in fact the root of all evil (which I've heard once or twice) then maybe it makes sense to start simple and iterate if necessary. In other words, say screw it, use HTTP, strip unnecessary headers, but design the code such that Avro's transport is interface based in case it needs to change. I think prototyping an Avro transport with HTTP, optimizing, and then dropping lower level if necessary is a better approach than going straight to the latter. All of that said, I don't have the insight into the code base that some of the other folks do. This is based on my experience with similar high throughput systems, but I wouldn't say I'm 100% convinced it applies here as the payloads in those systems were bigger than 8 bytes. -- Eric Sammer [EMAIL PROTECTED] http://esammer.blogspot.com
-
Re: HTTP transport?Doug Cutting 2009-10-05, 23:48
Sanjay Radia wrote:
> What about out of order exchange. Will we be able to support that with > http transport? Out-of-order exchange was originally added to Hadoop's RPC when it was a part of Nutch. It's an important optimization for distributed search, but it's not clear how important it is currently to Hadoop. That said, the simple way to deal with this in HTTP is to use a client library that pools connections, so that, if a second request to the same service is made by another thread in the same client process before the first has returned, a second connection is opened. If this is common, the high-water mark of connections on the server will be higher. However with an async-io-based server, the number of connections should not be a primary bottleneck. And again, we don't know how common this is. Doug
-
Re: HTTP transport?Scott Carey 2009-10-06, 02:59
On 10/5/09 1:53 PM, "Eric Sammer" <[EMAIL PROTECTED]> wrote: > Ryan: > > Certainly keep alive will help in this case, if that's what you're > referring to. The server holds the socket for N seconds or M requests, > which ever comes first. What you're saving with KA is the connection > setup / tear down. If you have a lot of cases where the client makes a > single request and goes away, then KA hurts because the server holds the > connection for the KA timeout (N seconds). This *really* helps if you're > using TLS due to the additional connection setup overhead. > > It's my opinion and experience that KA helps greatly in the case of many > exchanges between a small to medium number of clients and a server such > as RPC. The anti-example is an ad server or web beacon server, for instance. > Even in the beacon case, if the browser is likely to send another request shortly, it cuts the effective network latency in half. Establishing a TCP connection is at minimum one full round trip -- before the request. If latency is important KeepAlive is useful as long as a second request is expected in a short enough time. As long as the server is not process or thread per connection, one can scale up connection count rather high (20k) if necessary. With respect to Avro/Hadoop, I suspect requests from clients to be time clustered. > Regards. > > Ryan Rawson wrote: >> I have a question about these headers... will they impact the ability to do >> many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000 >> rpcs/second. Would this help or hinder? >> >> On Oct 5, 2009 4:44 PM, "Eric Sammer" <[EMAIL PROTECTED]> wrote: >> >> Doug Cutting wrote: > More or less. Except we can probably arrange to omit >> most of those > response... >> Content-Type and Server are probably unavoidable. Some of the others are >> extremely helpful during development / debugging / etc. It depends on >> how "open" you are about HTTP being the transport (i.e. do you let >> developers augment these headers to support additional features, etc.). >> This may not make sense in the context of something specialized like >> Avro transport. >> >>> I today implemented a simple HTTP-based transport for Avro: > > >> https://issues.apache.org/jira... >> Just out of curiousity, were you using HTTP keep alive? During testing >> on a project a few years ago, I found a huge difference if Keep Alive is >> supported. In retrospect, that should have been obvious. I'd imagine the >> usage pattern here would be a large number of repeated calls between the >> same client / server within a short period of time; perfect for KA. >> >> Regards. >> -- >> Eric Sammer >> [EMAIL PROTECTED] >> http://esammer.blogspot.com >> > > > -- > Eric Sammer > [EMAIL PROTECTED] > http://esammer.blogspot.com >
-
Re: HTTP transport?Eric Sammer 2009-10-06, 03:15
Scott Carey wrote:
> Even in the beacon case, if the browser is likely to send another request > shortly, it cuts the effective network latency in half. Which is generally not the case in the beacon / ad server use case. That was the only point I was making. That's besides the point, though. I think we both agree that KA for something like Avro transport is probably good. > Establishing a TCP > connection is at minimum one full round trip -- before the request. If > latency is important KeepAlive is useful as long as a second request is > expected in a short enough time. > > As long as the server is not process or thread per connection, one can scale > up connection count rather high (20k) if necessary. > > With respect to Avro/Hadoop, I suspect requests from clients to be time > clustered. That was my thought as well. The thing that gets me is that in the case of Hadoop (and the related subprojects) the clients utilizing this particular HTTP connection are probably going to be pretty small (maybe low thousands?). This is even better for keep alive as there's a solid chance you're going to have a high reuse rate. Of course, I'm assuming we're talking about things like name node to data node, hbase client to region servers, and those types of communications. Even if you just used Jetty (or any other thin HTTP 1.1 container that supports KA), one should easily be able to see good performance. Regards. -- Eric Sammer [EMAIL PROTECTED] http://esammer.blogspot.com
-
Re: HTTP transport?Scott Carey 2009-10-06, 03:19
On 10/5/09 1:47 PM, "Ryan Rawson" <[EMAIL PROTECTED]> wrote: > I have a question about these headers... will they impact the ability to do > many, but small, rpcs? Imagine you'd need to support 5,000 to 50,000 > rpcs/second. Would this help or hinder? > As long as the HTTP response and request fit in one network packet (pessimistic - 1KB or so) there is not much overhead. 50k rpcs/sec with gigabit ethernet saturated (~100MB/sec) is ~2KB per request. So, on faster networks an extra 100 to 200 bytes or so won't matter. On the WAN, it will have more of an effect if the bandwidth is low and the latency also very low if the RPC is very 'chatty' and not 'chunky' enough. However, on most WAN links network latency is going to kill you far, far more than an extra 200 bytes. For example, imagine a 20ms latency link. The max RPC throughput to a single client is then 50/sec (one per 20ms). With a 1k payload per request, that's 50k per sec max data transfer. HTTP pipelining could help here -- but isn't as well supported as one would like. If WAN level RPC is a goal, the main challenges there will be latency related first, and packet size related second. On a fast local network (gigabit) I suspect throughput problems of other sorts to be the issue before bandwidth from slightly larger packets. Furthermore, its not like a TCP packet is 0 bytes on its own. HTTP adds some overhead, but it can be kept relatively trim.
-
Re: HTTP transport?Scott Carey 2009-10-06, 03:30
>> With respect to Avro/Hadoop, I suspect requests from clients to be time >> clustered. > > That was my thought as well. The thing that gets me is that in the case > of Hadoop (and the related subprojects) the clients utilizing this > particular HTTP connection are probably going to be pretty small (maybe > low thousands?). This is even better for keep alive as there's a solid > chance you're going to have a high reuse rate. Of course, I'm assuming > we're talking about things like name node to data node, hbase client to > region servers, and those types of communications. Even if you just used > Jetty (or any other thin HTTP 1.1 container that supports KA), one > should easily be able to see good performance. > Absolutely. Additionally, I realized one more thing -- when a client knows they aren't likely to send another request soon, they can send a request with Connection: close. Well behaved clients can help maximize the benefit. > Regards. > -- > Eric Sammer > [EMAIL PROTECTED] > http://esammer.blogspot.com >
-
Re: HTTP transport?Owen O'Malley 2009-10-08, 22:10
I still don't see how to make this play well with security. Security
needs to go under the transport layer so that it is easy to add encryption on the wire. If you go with HTTP, the only way that is portable at all is to use HTTP over SSL. SSL is for when there aren't shared keys and Kerberos provides those shared keys. SPNEGO is the standard method of using Kerberos with HTTP and we are planning to use that for the web UI's. But SPNEGO is very much the least painful of the alternatives and I'd rather not force our RPC services into that corner. I also have serious doubts about performance, but that is hard to answer until we have code to test. It is an interesting question how much we depend on being able to answer queries out of order. There are some parts of the code where overlapping requests from the same client matter. In particular, the terasort scheduler uses threads to access the namenode. That would stop providing any pipelining, which I believe would be significant. In short, I think that an HTTP transport is great for playing with, but I don't think you can assume it will work as the primary transport. -- Owen
-
Re: HTTP transport?Doug Cutting 2009-10-09, 17:49
Owen O'Malley wrote:
> SPNEGO is the > standard method of using Kerberos with HTTP and we are planning to use > that for the web UI's. Java 6 also supports using SPNEGO for RPC over HTTP out of the box: http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html > I also have serious doubts about performance, but that is hard to answer > until we have code to test. The good news is that, since the HTTP stuff is already implemented, we can test its performance easily. Performance of insecure access over HTTP looks good so far. It's an open question are how much HTTP-based security will slow things versus non-HTTP-based security. > It is an interesting question how much we > depend on being able to answer queries out of order. There are some > parts of the code where overlapping requests from the same client > matter. In particular, the terasort scheduler uses threads to access the > namenode. That would stop providing any pipelining, which I believe > would be significant. No, we wouldn't stop any pipelining, we'd just use more connections to implement it. With HttpClient one can limit the number of pooled connnections per host: http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29 Connections are not free of course, but Jetty has been benchmarked at 20,000 concurrent connections: http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/ > In short, I think that an HTTP transport is great for playing with, but > I don't think you can assume it will work as the primary transport. I agree, we cannot assume it. But it's easy to try it and see how it fares. Any investment in getting it working is perhaps not wasted, since, besides providing a performance baseline, it also may be useful to provide HTTP-based access to services even if a higher-performance option is implemented. Doug
-
Re: HTTP transport?Sanjay Radia 2009-10-09, 18:13
On 10/9/09 10:49 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Owen O'Malley wrote: >> SPNEGO is the >> standard method of using Kerberos with HTTP and we are planning to use >> that for the web UI's. > > Java 6 also supports using SPNEGO for RPC over HTTP out of the box: > > http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html > >> I also have serious doubts about performance, but that is hard to answer >> until we have code to test. > > The good news is that, since the HTTP stuff is already implemented, we > can test its performance easily. Performance of insecure access over > HTTP looks good so far. It's an open question are how much HTTP-based > security will slow things versus non-HTTP-based security. > >> It is an interesting question how much we >> depend on being able to answer queries out of order. There are some >> parts of the code where overlapping requests from the same client >> matter. In particular, the terasort scheduler uses threads to access the >> namenode. That would stop providing any pipelining, which I believe >> would be significant. > > No, we wouldn't stop any pipelining, we'd just use more connections to > implement it. With HttpClient one can limit the number of pooled > connnections per host: > > http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/Mult > iThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29 > > Connections are not free of course, but Jetty has been benchmarked at > 20,000 concurrent connections: > > http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/ > >> In short, I think that an HTTP transport is great for playing with, but >> I don't think you can assume it will work as the primary transport. > > I agree, we cannot assume it. But it's easy to try it and see how it > fares. Any investment in getting it working is perhaps not wasted, > since, besides providing a performance baseline, it also may be useful > to provide HTTP-based access to services even if a higher-performance > option is implemented. Will the RPC over HTTP be transparent so that that we can replace with a different layer if needed? My worry was the separation of data and checksums; someone had mentioned that one could do this over 2 RPCs - that is not transparent. Also the other issue is porting from data transfer socket streams to RPC - that port will not be transparent. We cannot afford to loose performance over that change. Further, moving from streaming sockets to RPC is a very significant code change to the dfs-client and data nodes. I assume that we going to create a branch that moves the data transfer protocols to RPC and test the performance and if it is good then we commit and move to RPC? I am worried about this part - I am surprised that you two are not. Am I missing something here? sanjay > > Doug
-
Re: HTTP transport?Doug Cutting 2009-10-09, 19:56
Sanjay Radia wrote:
> Will the RPC over HTTP be transparent so that that we can replace with a > different layer if needed? Yes. > My worry was the separation of data and checksums; someone had mentioned > that one could do this over 2 RPCs - that is not transparent. That was suggested as a possibility if we did not want to use RPC for data, but rather raw HTTP, e.g., with a separate URL per block. The zerocopy support built into most HTTP servers only supports entire responses from a single file, so if we wanted to take advantage of these zerocopy implementations we'd not use RPC for block access, but could use HTTP and hence share security, etc. Using raw HTTP for block access might also perform better, since it can use TCP flow control, rather than RPC call/response. In my microbenchmarks, RPC call/response was fast enough to easily saturate disks and networks, so that might be moot, although RPC call/response for file data may use more CPU than we'd like. With our own transport implementation we could get RPC call/response to use zerocopy for file data. > I assume that we > going to create a branch that moves the data transfer protocols to RPC and > test the performance and if it is good then we commit and move to RPC? Yes. We obviously cannot change the file data transfer protocol without benchmarking. Ideally file data transfer can share as much as possible with other protocols. The most optimistic approach would be to use HTTP-based RPC call/response, so we ought to benchmark that. This was the purpose of my recently-reported microbenchmarks. We also need to determine whether both TCP flow-control and zerocopy are critical to data file performance. If both are indeed critical, and HTTP proves sufficient for everything else, then we should consider using non-RPC HTTP for file data transfer, since it supports both zerocopy and TCP-based flow control, and the implementation of security, etc. could be shared. But, on the other hand, if HTTP is deemed inappropriate for security and we develop our own RPC transport that permits zerocopy, and TCP flow-control over entire blocks is not required, then we might use RPC for file data. What I'm hoping we can avoid is, as today, using different transports for different protocols, re-implementing security, connection pooling, async request processing, etc. for each, requiring separate configuration and ports for each, etc. But even that might be required. We don't know yet. I think starting with HTTP as a hypothesis permits us to make progress without a lot of up-front investment. Doug
-
Re: HTTP transport?Scott Carey 2009-10-11, 01:11
On 10/9/09 10:49 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > >> It is an interesting question how much we >> depend on being able to answer queries out of order. There are some >> parts of the code where overlapping requests from the same client >> matter. In particular, the terasort scheduler uses threads to access the >> namenode. That would stop providing any pipelining, which I believe >> would be significant. > > No, we wouldn't stop any pipelining, we'd just use more connections to > implement it. With HttpClient one can limit the number of pooled > connnections per host: > Also since HTTP supports in-order pipelining out of the box, its only out-of-order stuff that would require additional connections. > > Doug > Requirements may end up ruling out HTTP, but I doubt that performance (in the insecure case) will be the cause since there are so many high performance client and server implementations available. Consider something lower level than the Servlet API for the server side -- it is baggage-laden and does not allow access to all data in unconverted form or any asynchronous i/o. In this respect, jetty has lower level, light-weight API access points. http://docs.codehaus.org/display/JETTY/Architecture If HTTP is not used, I suggest a strong look at apache MINA for constructing high performance NIO clients and servers with Java http://mina.apache.org/
-
Re: HTTP transport?Kan Zhang 2009-10-14, 01:59
On 10/9/09 12:56 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Sanjay Radia wrote: >> Will the RPC over HTTP be transparent so that that we can replace with a >> different layer if needed? > > Yes. > >> My worry was the separation of data and checksums; someone had mentioned >> that one could do this over 2 RPCs - that is not transparent. > > That was suggested as a possibility if we did not want to use RPC for > data, but rather raw HTTP, e.g., with a separate URL per block. The > zerocopy support built into most HTTP servers only supports entire > responses from a single file, so if we wanted to take advantage of these > zerocopy implementations we'd not use RPC for block access, but could > use HTTP and hence share security, etc. Using raw HTTP for block access > might also perform better, since it can use TCP flow control, rather > than RPC call/response. In my microbenchmarks, RPC call/response was > fast enough to easily saturate disks and networks, so that might be > moot, although RPC call/response for file data may use more CPU than > we'd like. With our own transport implementation we could get RPC > call/response to use zerocopy for file data. > One problem I see with using HTTP is that it's expensive to provide data encryption. We're currently adding 2 authentication mechanisms (Kerberos and DIGEST-MD5) to our existing RPC. Both of them can provide data encryption for subsequent communication over the authenticated channel. However, when similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP DIGEST, respectively), they don't provide data encryption (correct me if I'm wrong). For data encryption over HTTP, one has to use SSL, which is expensive. Kan
-
Re: HTTP transport?Doug Cutting 2009-10-14, 16:37
Kan Zhang wrote:
> One problem I see with using HTTP is that it's expensive to provide data > encryption. We're currently adding 2 authentication mechanisms (Kerberos and > DIGEST-MD5) to our existing RPC. Both of them can provide data encryption > for subsequent communication over the authenticated channel. However, when > similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP > DIGEST, respectively), they don't provide data encryption (correct me if I'm > wrong). For data encryption over HTTP, one has to use SSL, which is > expensive. Java supports using Kerberos-based encryption for TLS (nee SSL): http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB http://tools.ietf.org/html/rfc2712 There's also a standard way to use tickets over TLS: http://tools.ietf.org/html/rfc4507 Doug
-
Re: HTTP transport?Kan Zhang 2009-10-14, 17:45
On 10/14/09 9:37 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Kan Zhang wrote: >> One problem I see with using HTTP is that it's expensive to provide data >> encryption. We're currently adding 2 authentication mechanisms (Kerberos and >> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption >> for subsequent communication over the authenticated channel. However, when >> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP >> DIGEST, respectively), they don't provide data encryption (correct me if I'm >> wrong). For data encryption over HTTP, one has to use SSL, which is >> expensive. > > Java supports using Kerberos-based encryption for TLS (nee SSL): > > http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB > This addresses part of my concern (the Kerberos part). I wasn't aware Java already supports it. Thanks for pointing it out. Kan
-
Re: HTTP transport?Kan Zhang 2009-11-06, 19:15
On 10/14/09 9:37 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Kan Zhang wrote: >> One problem I see with using HTTP is that it's expensive to provide data >> encryption. We're currently adding 2 authentication mechanisms (Kerberos and >> DIGEST-MD5) to our existing RPC. Both of them can provide data encryption >> for subsequent communication over the authenticated channel. However, when >> similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP >> DIGEST, respectively), they don't provide data encryption (correct me if I'm >> wrong). For data encryption over HTTP, one has to use SSL, which is >> expensive. > > Java supports using Kerberos-based encryption for TLS (nee SSL): > > http://java.sun.com/j2se/1.5.0/docs/guide/security/jsse/JSSERefGuide.html#KRB > > http://tools.ietf.org/html/rfc2712 > Thanks for pointing this out. I did a little testing on it. It seems that when you use Kerberos cipher suites with SSL, the Kerberos service name for a TLS server has to be literally "host." For example, a TLS server running on the machine mach1.imc.org in the Kerberos realm IMC.ORG must use host/[EMAIL PROTECTED] as its Kerberos principal name. I couldn't find a way to specify a different service name. Can someone confirm this? This can be a limitation since we typically run DN and TT on the same set of nodes. Kan
-
Re: HTTP transport?Doug Cutting 2009-11-06, 21:06
Kan Zhang wrote:
> Thanks for pointing this out. I did a little testing on it. It seems that > when you use Kerberos cipher suites with SSL, the Kerberos service name for > a TLS server has to be literally "host." For example, a TLS server running > on the machine mach1.imc.org in the Kerberos realm IMC.ORG must use > host/[EMAIL PROTECTED] as its Kerberos principal name. I couldn't find a > way to specify a different service name. Can someone confirm this? This can > be a limitation since we typically run DN and TT on the same set of nodes. This is unfortunate. It looks to be part of the specification. BTW, I found an approach to Kerberos over HTTP bypassing SPNEGO: http://beamdocs.fnal.gov/DocDB/0019/001987/001/KMJ3_1-guide.pdf Starting on page 13, he suggests having an applet that the browser loads to create a ticket. The ticket is created by the user's browser talking directly to Kerberos. Then the ticket can be used in subsequent requests to identify the user. An application using HTTP could similarly contact Kerberos directly to create tickets that are sent with requests. No multi-step HTTP handshake is thus required. Doug
-
Re: HTTP transport?Patrick Hunt 2009-11-12, 22:22
One additional benefit of using HTTP is that people are always working
to improve performance, and not only optimizing servers -- Google's SPDY: http://www.readwriteweb.com/archives/spdy_google_wants_to_speed_up_the_web.php Multiplexed requests, compressed headers, etc... Patrick Doug Cutting wrote: > I'm considering an HTTP-based transport for Avro as the preferred, > high-performance option. > > HTTP has lots of advantages. In particular, it already has > - lots of authentication, authorization and encryption support; > - highly optimized servers; > - monitoring, logging, etc. > > Tomcat and other servlet containers support async NIO, where a thread is > not required per connection. A servlet can process bulk data with a > single copy to and from the socket (bypassing stream buffers). Calls > can be multiplexed over a single HTTP connection using Comet events. > > http://tomcat.apache.org/tomcat-6.0-doc/aio.html > > Zero copy is not an option for servlets that generate arbitrary data, > but one can specify a file/start/length tuple and Tomcat will use > sendfile to write the response. That means that while HDFS datanode > file reads could not be done via RPC, they could be done via HTTP with > zero-copy. If authentication and authorization are already done in the > HTTP server, this may not be a big loss. The HDFS client might make two > HTTP requests, one to read a files data, and another to read its > checksums. The server would then stream the entire block to the client > using sendfile, using TCP flow control as today. > > Thoughts? > > Doug |