Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> risks of using Hadoop

Copy link to this message
RE: risks of using Hadoop


I think it is arrogant to parrot FUD when you've never had your hands dirty in any real Hadoop environment.
So how could your response reflect the operational realities of running a Hadoop cluster?

What Brian was saying was that the SPOF is an over played FUD trump card.
Anyone who's built clusters will have mitigated the risks of losing the NN.
Then there's MapR... where you don't have a SPOF. But again that's a derivative of Apache Hadoop.
(Derivative isn't a bad thing...)

You're right that you need to plan accordingly, however from risk perspective, this isn't a risk.
In fact, I believe Tom White's book has a good layout to mitigate this and while I have First Ed, I'll have to double check the second ed to see if he modified it.

Again, the point Brian was making and one that I agree with is that the NN as a SPOF is an overblown 'risk'.

You have a greater chance of data loss than you do of losing your NN.

Probably the reason why some of us are a bit irritated by the SPOF reference to the NN is that its clowns who haven't done any work in this space, pick up on the FUD and spread it around. This makes it difficult for guys like me from getting anything done because we constantly have to go back and reassure stake holders that its a non-issue.

With respect to naming vendors, I did name MapR outside of Apache because they do have their own derivative release that improves upon the limitations found in Apache's Hadoop.

PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really lose sleep over? Do you understand that there are risks and there are improbable risks?
> Subject: RE: risks of using Hadoop
> Date: Tue, 20 Sep 2011 12:48:05 -0700
> No worries Michael - it would be stretch to see any arrogance or
> disrespect in your response.
> Kobina has asked a fair question, and deserves a response that reflects
> the operational realities of where we are.
> If you are looking at doing large scale CDR handling - which I believe is
> the use case here - you need to plan accordingly. Even you use the term
> "mitigate" - which is different than "prevent".  Kobina needs an
> understanding of that they are looking at. That isn't a pro/con stance on
> Hadoop, it is just reality and they should plan accordingly.
> (Note - I'm not the one who brought vendors into this - which doesn't
> strike me as appropriate for this list)
> ------------------------------------------------
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> Michael Segel <[EMAIL PROTECTED]>
> 09/17/2011 07:37 PM
> Please respond to
> To
> cc
> Subject
> RE: risks of using Hadoop
> Gee Tom,
> No disrespect, but I don't believe you have any personal practical
> experience in designing and building out clusters or putting them to the
> test.
> Now to the points that Brian raised..
> 1) SPOF... it sounds great on paper. Some FUD to scare someone away from
> Hadoop. But in reality... you can mitigate your risks by setting up raid
> on your NN/HM node. You can also NFS mount a copy to your SN (or whatever
> they're calling it these days...) Or you can go to MapR which has
> redesigned HDFS which removes this problem. But with your Apache Hadoop or
> Cloudera's release, losing your NN is rare. Yes it can happen, but not
> your greatest risk. (Not by a long shot)
> 2) Data Loss.
> You can mitigate this as well. Do I need to go through all of the options
> and DR/BCP planning? Sure there's always a chance that you have some Luser