Re: SDP support for Hadoop RPC
Steve Loughran 2013-10-09, 08:45
On 9 October 2013 01:57, Milind Bhandarkar <[EMAIL PROTECTED]>wrote:

> Yes, we have. It works very well, but it is considered too niche by folks
> who insist on buying the least capable hardware for their test clusters,
> and therefore, recommend such underpowered clusters to customers as well.
surely you meant to say  "take advantage of the cost model of JBOD storage
and ethernet to allow data to be stored and accessed at significantly lower
price points than for legacy storage architectures and pricing models -so
enabling their customers to store and process data they would have
previously had to discard" (0)

IB should be most interesting at the app level -for apps > classic MR.
That's giraph, streaming work, Tez. I'd like to see some numbers there. As
the oracle

For storage, IB would make locality less of an issue (1,2), and instead
make the level of storage: SSD vs HDD more significant in terms of
performance (2). There is ongoing work there  in a set of JIRAs about
multi-tier storage.

I don't know the current state of Hadoop on IB, or even if allocateDirect()
of NIO has been picked up. For IPC there should be some latency
improvements, while for the Datanodes its the bulk data you want to push
around faster. If you want to work on either of those problems you'd be
very welcome.

(0) I also have a VMWare test cluster for some HA work and VM capacity from
Rackspace for a broader pool of deployment options.
(1) Hadoop 2.1 supports Unix Domain Sockets for a direct-yet-secure
connection from a local app (HBase, ...) and the Datanode. This bypasses
the network stack entirely
(3) http://www.cs.berkeley.edu/%7Eganesha/disk-irrelevant_hotos2011.pdf

