-Re: SDP support for Hadoop RPC
Milind Bhandarkar 2013-10-09, 09:09
No, Steve, I meant exactly what I wrote. One day, when you are here, Let's meet in the same chinese restaurant, and I will give you the numbers on performance and cost, and let us do the division of these two numbers.
Let's talk about how the nn latency becomes a bottleneck for rest of the cluster's throughput, and why the networking world's advances cannot be pushed under the rug.
Let's talk about why your employers are cozying up with DSSD and engenio while you and others in open source are insisting on 1GbE and DAS SATA disks being the most suitable for Hadoop.
And most of all, lets chat about why business aspects of Hadoop are acting against the open source from the same orgs' folks.
Tomorrow and day after we are conducting big data benchmarking workshop in San Jose, where your partners and other open-core hadoop company's partners will demonstrate how advanced hardware (cpus, networks, storage) is more cost effective that what you are recommending.
I had recognized this phenomenon very early, and wrote a blog post comparing open source hadoop development to charlie chaplin, who missed the color and talky movie technology, by sticking to mute black and white technology. I know your employer has moved beyond cheap hardware, based on what I hear from customers where we compete. I an wondering why you still keep on insisting new technologies are not worth it.
Sent from my iPhone
> On Oct 9, 2013, at 1:45, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 9 October 2013 01:57, Milind Bhandarkar <[EMAIL PROTECTED]>wrote:
>> Yes, we have. It works very well, but it is considered too niche by folks
>> who insist on buying the least capable hardware for their test clusters,
>> and therefore, recommend such underpowered clusters to customers as well.
> surely you meant to say "take advantage of the cost model of JBOD storage
> and ethernet to allow data to be stored and accessed at significantly lower
> price points than for legacy storage architectures and pricing models -so
> enabling their customers to store and process data they would have
> previously had to discard" (0)
> IB should be most interesting at the app level -for apps > classic MR.
> That's giraph, streaming work, Tez. I'd like to see some numbers there. As
> the oracle
> For storage, IB would make locality less of an issue (1,2), and instead
> make the level of storage: SSD vs HDD more significant in terms of
> performance (2). There is ongoing work there in a set of JIRAs about
> multi-tier storage.
> I don't know the current state of Hadoop on IB, or even if allocateDirect()
> of NIO has been picked up. For IPC there should be some latency
> improvements, while for the Datanodes its the bulk data you want to push
> around faster. If you want to work on either of those problems you'd be
> very welcome.
> (0) I also have a VMWare test cluster for some HA work and VM capacity from
> Rackspace for a broader pool of deployment options.
> (1) Hadoop 2.1 supports Unix Domain Sockets for a direct-yet-secure
> connection from a local app (HBase, ...) and the Datanode. This bypasses
> the network stack entirely
> (3) http://www.cs.berkeley.edu/%7Eganesha/disk-irrelevant_hotos2011.pdf
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.