Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> HTTP transport?


Copy link to this message
-
Re: HTTP transport?

On 10/9/09 10:49 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Owen O'Malley wrote:
>> SPNEGO is the
>> standard method of using Kerberos with HTTP and we are planning to use
>> that for the web UI's.
>
> Java 6 also supports using SPNEGO for RPC over HTTP out of the box:
>
> http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html
>
>> I also have serious doubts about performance, but that is hard to answer
>> until we have code to test.
>
> The good news is that, since the HTTP stuff is already implemented, we
> can test its performance easily.  Performance of insecure access over
> HTTP looks good so far.  It's an open question are how much HTTP-based
> security will slow things versus non-HTTP-based security.
>
>> It is an interesting question how much we
>> depend on being able to answer queries out of order. There are some
>> parts of the code where overlapping requests from the same client
>> matter. In particular, the terasort scheduler uses threads to access the
>> namenode. That would stop providing any pipelining, which I believe
>> would be significant.
>
> No, we wouldn't stop any pipelining, we'd just use more connections to
> implement it.  With HttpClient one can limit the number of pooled
> connnections per host:
>
> http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/Mult
> iThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29
>
> Connections are not free of course, but Jetty has been benchmarked at
> 20,000 concurrent connections:
>
> http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/
>
>> In short, I think that an HTTP transport is great for playing with, but
>> I don't think you can assume it will work as the primary transport.
>
> I agree, we cannot assume it.  But it's easy to try it and see how it
> fares.  Any investment in getting it working is perhaps not wasted,
> since, besides providing a performance baseline, it also may be useful
> to provide HTTP-based access to services even if a higher-performance
> option is implemented.

Will the RPC over HTTP be transparent so that that we can replace with a
different layer if needed?
My worry was the separation of data and checksums; someone had mentioned
that one could do this over 2 RPCs - that is not transparent.

Also the other issue is porting from data transfer socket streams to RPC -
that port will not be transparent. We cannot afford to loose performance
over that change. Further,  moving from streaming sockets to RPC is a very
significant code change to the dfs-client and data nodes. I assume that we
going to create a branch that moves the data transfer protocols to RPC and
test the performance and if it is good then we commit and move to RPC?
I am worried about this part - I am surprised that you two are not. Am I
missing something here?

sanjay

>
> Doug
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB