Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - HTTP transport?


Copy link to this message
-
Re: HTTP transport?
Sanjay Radia 2009-10-09, 18:13

On 10/9/09 10:49 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Owen O'Malley wrote:
>> SPNEGO is the
>> standard method of using Kerberos with HTTP and we are planning to use
>> that for the web UI's.
>
> Java 6 also supports using SPNEGO for RPC over HTTP out of the box:
>
> http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html
>
>> I also have serious doubts about performance, but that is hard to answer
>> until we have code to test.
>
> The good news is that, since the HTTP stuff is already implemented, we
> can test its performance easily.  Performance of insecure access over
> HTTP looks good so far.  It's an open question are how much HTTP-based
> security will slow things versus non-HTTP-based security.
>
>> It is an interesting question how much we
>> depend on being able to answer queries out of order. There are some
>> parts of the code where overlapping requests from the same client
>> matter. In particular, the terasort scheduler uses threads to access the
>> namenode. That would stop providing any pipelining, which I believe
>> would be significant.
>
> No, we wouldn't stop any pipelining, we'd just use more connections to
> implement it.  With HttpClient one can limit the number of pooled
> connnections per host:
>
> http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/Mult
> iThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29
>
> Connections are not free of course, but Jetty has been benchmarked at
> 20,000 concurrent connections:
>
> http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/
>
>> In short, I think that an HTTP transport is great for playing with, but
>> I don't think you can assume it will work as the primary transport.
>
> I agree, we cannot assume it.  But it's easy to try it and see how it
> fares.  Any investment in getting it working is perhaps not wasted,
> since, besides providing a performance baseline, it also may be useful
> to provide HTTP-based access to services even if a higher-performance
> option is implemented.

Will the RPC over HTTP be transparent so that that we can replace with a
different layer if needed?
My worry was the separation of data and checksums; someone had mentioned
that one could do this over 2 RPCs - that is not transparent.

Also the other issue is porting from data transfer socket streams to RPC -
that port will not be transparent. We cannot afford to loose performance
over that change. Further,  moving from streaming sockets to RPC is a very
significant code change to the dfs-client and data nodes. I assume that we
going to create a branch that moves the data transfer protocols to RPC and
test the performance and if it is good then we commit and move to RPC?
I am worried about this part - I am surprised that you two are not. Am I
missing something here?

sanjay

>
> Doug